- This event has passed.
September 19 9:00 am - 10:00 am EDT
Title: Scalable Ordinal Embedding to Model Text Similarity
Speaker: Jesse Anderton, PhD Candidate, College of Computer and Information Science at Northeastern University
Location: Northeastern University, 440 Huntington Avenue, West Village H, 1st Floor, Room #166, Boston, Massachusetts 02115
Practitioners of Machine Learning and related fields commonly seek out embeddings of object col- lections into some Euclidean space. These embeddings are useful for dimensionality reduction, for data visualization, as concrete representations of abstract notions of similarity for similarity search, or as features for some downstream learning task such as web search or sentiment analysis. A wide array of such techniques exist, ranging from traditional (PCA, MDS) to trendy (word2vec, deep learning).
While most existing techniques rely on preserving some type of exact numeric data (feature values, or estimates of various statistics), Anderton proposes to develop and apply large-scale techniques for embedding and similarity search using purely ordinal data (e.g. “object a is more similar to b than to c”). Recent theoretical advances show that ordinal data does not inherently lose information, in the sense that, when carefully applied to an appropriate dataset, there is an embedding satisfying ordinality which is unique up to similarity transforms (scaling, translation, reflection, and rotation). Further, ordinality is often a more natural way to represent the common goal of finding an embedding which preserves some notion of similarity without taking noisy statistical estimates too literally.
The work Anderton proposes focuses on three tasks: selecting the minimal ordinal data needed to produce a high-quality embedding, embedding large-scale datasets of high dimensionality, and developing ordinal embeddings that depend on contextual features for, e.g., recommender systems.
About the Speaker
Jesse Anderton is a PhD candidate studying machine learning approaches to interpret and model user behavior and human opinion and information retrieval in Northeastern University’s College of Computer and Information Science, advised by Professor Javed Aslam. His work examines the information bubble – how people focus on online content that supports their opinions – and takes an interdisciplinary approach to model and predict how people might differ in opinion on one topic given the other views they hold.
- Professor Javed Aslam, Professor, Senior Associate Dean of Academic Affairs, College of Computer and Information Science (CCIS) at Northeastern University (Advisor)
- Professor David Smith, Assistant Professor, College of Computer and Information Science (CCIS) at Northeastern University
- Professor Byron Wallace, Assistant Professor, College of Computer and Information Science (CCIS) at Northeastern University
- Fernando Diaz, Research, Spotify