## Contact

## Office Location

440 Huntington Avenue

310E West Village H

Boston, MA 02115

## Mailing Address

Northeastern University

ATTN: Ehsan Elhamifar, 202 WVH

360 Huntington Avenue

Boston, MA 02115

## Research Interests

- Machine Learning: subset selection and summarization for massive complex datasets, high-dimensional multi-manifold data analysis, learning nonlinear dynamic models, deep learning
- Computer Vision: motion and activity segmentation, object recognition and detection, video summarization, active learning for visual data
- Optimization: sparse and low-rank recovery algorithms, convex programming, submodular optimization

## Education

- PhD in Electrical and Computer Engineering, Johns Hopkins University
- MS in Applied Mathematics and Statistics, Johns Hopkins University
- MS in Electrical Engineering, Sharif University of Technology

## Biography

Ehsan Elhamifar is an Assistant Professor in the College of Computer and Information Science and is affiliated with the Department of Electrical and Computer Engineering at Northeastern University. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley. He obtained his PhD in Electrical and Computer Engineering from the Johns Hopkins University in 2012. Dr. Elhamifar obtained two Master’s degrees, one in Electrical Engineering from Sharif University of Technology in Iran in 2006 and another in Applied Mathematics and Statistics from the Johns Hopkins University in 2010. He was a visiting researcher at Stanford University, University of Minnesota and Duke University for several months during 2011 and 2012.

Prof. Elhamifar’s research areas are machine learning, computer vision, optimization and algorithms. He is broadly interested in developing efficient, robust and provable algorithms that can address challenges of complex and large-scale high-dimensional data. He works on applications of these tools in computer vision and robotics. Specifically, he uses tools from convex geometry and analysis, optimization, sparse and low-rank modeling, high-dimensional statistics and graph theory to develop algorithms and theory and applies them to solve real-world problems, including motion and activity segmentation in videos, object detection and recognition, video summarization, active learning and more.

## Recent Publications

### Dissimilarity-based Sparse Subset Selection

Dissimilarity-based Sparse Subset Selection, E. Elhamifar, G. Sapiro and S. Sastry, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.

Finding an informative subset of a large number of data points or models is at the center of many problems in machine learning, computer vision, bio/health informatics and image/signal processing. Given pairwise dissimilarities between the elements of a `source set’ and a `target set,’ we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, an NP-hard problem, we consider a convex relaxation. The solution of our proposed optimization program finds the representatives and the probability that each element of the target set is associated with the representatives. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, the solution of our proposed optimization program finds representatives from all groups and reveals clustering of the sets. In addition, we show that our proposed formulation can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our proposed algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational cost. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.

### Energy Disaggregation via Learning ‘Powerlets’ and Sparse Coding

Energy Disaggregation via Learning ‘Powerlets’ and Sparse Coding, E. Elhamifar and S. Sastry, AAAI Conference on Artificial Intelligence (AAAI), 2015.

Abstract

In this paper, we consider the problem of energy disaggregation, i.e., decomposing a whole home electricity signal into its component appliances. We propose a new supervised algorithm, which in the learning stage, automatically extracts signature consumption patterns of each device by modeling the device as a mixture of dynamical systems. In order to extract signature consumption patterns of a device corresponding to its different modes of operation, we define appropriate dissimilarities between energy snippets of the device and use them in a subset selection scheme, which we generalize to deal with time-series data. We then form a dictionary that consists of extracted power signatures across all devices. We cast the disaggregation problem as an optimization over a representation in the learned dictionary and incorporate several novel priors such as device-sparsity, knowledge about devices that do or do not work together as well as temporal consistency of the disaggregated solution. Real experiments on a publicly available energy dataset demonstrate that our proposed algorithm achieves promising results for energy disaggregation.

### Robust Subspace Clustering

Robust Subspace Clustering, M. Soltanolkotabi, E. Elhamifar and E. J. Candes, Annals of Statistics, 2014.

Abstract

Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [18] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.

## Previous Publications

### 2013

### Sparse Subspace Clustering: Algorithm, Theory, and Applications

Sparse Subspace Clustering: Algorithm, Theory, and Applications, E. Elhamifar and R. Vidal, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2013.

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

### A Convex Optimization Framework for Active Learning

A Convex Optimization Framework for Active Learning, E. Elhamifar, G. Sapiro, A. Yang and S. Sastry, International Conference on Computer Vision (ICCV), 2013.

Abstract

In many image/video/web classification problems, we have access to a large number of unlabeled samples. However, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informative unlabeled samples, in order to obtain a high classi- fication performance. Most existing active learning algorithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant information overlap or they involve solving a non-convex optimization. More importantly, the majority of active learning algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunction with any type of classifiers, including those of the family of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classi- fier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informative unlabeled samples, which have the least information overlap. Our method can incorporate the data distribution in the selection process by using the appropriate dissimilarity between pairs of samples. We show the effectiveness of our framework in person detection, scene categorization and face recognition on real-world datasets.

### 2012

### Block-Sparse Recovery via Convex Optimization

Block-Sparse Recovery via Convex Optimization, E. Elhamifar and R. Vidal, IEEE Transactions on Signal Processing (TSP), 2012.

Given a dictionary that consists of multiple blocks and a signal that lives in the range space of only a few blocks, we study the problem of finding a block-sparse representation of the signal, i.e., a representation that uses the minimum number of blocks. Motivated by signal/image processing and computer vision applications, such as face recognition, we consider the block-sparse recovery problem in the case where the number of atoms in each block is arbitrary, possibly much larger than the dimension of the underlying subspace. To find a block-sparse representation of a signal, we propose two classes of non-convex optimization programs, which aim to minimize the number of nonzero coefficient blocks and the number of nonzero reconstructed vectors from the blocks, respectively. Since both classes of problems are NP-hard, we propose convex relaxations and derive conditions under which each class of the convex programs is equivalent to the original non-convex formulation. Our conditions depend on the notions of mutual and cumulative subspace coherence of a dictionary, which are natural generalizations of existing notions of mutual and cumulative coherence. We evaluate the performance of the proposed convex programs through simulations as well as real experiments on face recognition. We show that treating the face recognition problem as a block-sparse recovery problem improves the state-of-the-art results by 10% with only 25% of the training data.

### Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery, E. Elhamifar, G. Sapiro and R. Vidal, Neural Information Processing Systems (NIPS), 2012.

Abstract

Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can effi- ciently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data.

### See All by Looking at A Few: Sparse Modeling for Finding Representative Objects

See All by Looking at A Few: Sparse Modeling for Finding Representative Objects, E. Elhamifar, G. Sapiro and R. Vidal, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

Abstract

We consider the problem of finding a few representatives for a dataset, i.e., a subset of data points that efficiently describes the entire dataset. We assume that each data point can be expressed as a linear combination of the representatives and formulate the problem of finding the representatives as a sparse multiple measurement vector problem. In our formulation, both the dictionary and the measurements are given by the data matrix, and the unknown sparse codes select the representatives via convex optimization. In general, we do not assume that the data are lowrank or distributed around cluster centers. When the data do come from a collection of low-rank models, we show that our method automatically selects a few representatives from each low-rank model. We also analyze the geometry of the representatives and discuss their relationship to the vertices of the convex hull of the data. We show that our framework can be extended to detect and reject outliers in datasets, and to efficiently deal with new observations and large datasets. The proposed framework and theoretical foundations are illustrated with examples in video summarization and image classification using representatives.

### Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation

Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation, L. Tao, E. Elhamifar, S. Khudanpur, G. Hager, and R. Vidal, Information Processing in Computer Assisted Interventions (IPCAI), 2012.

#### Abstract

We consider the problem of classifying surgical gestures and skill level in robotic surgical tasks. Prior work in this area models gestures as states of a hidden Markov model (HMM) whose observations are discrete, Gaussian or factor analyzed. While successful, these approaches are limited in expressive power due to the use of discrete or Gaussian observations. In this paper, we propose a new model called sparse HMMs whose observations are sparse linear combinations of elements from a dictionary of basic surgical motions. Given motion data from many surgeons with different skill levels, we propose an algorithm for learning a dictionary for each gesture together with an HMM grammar describing the transitions among different gestures. We then use these dictionaries and the grammar to represent and classify new motion data. Experiments on a database of surgical motions acquired with the da Vinci system show that our method performs on par with or better than state-of-the-art methods.This suggests that learning a grammar based on sparse motion dictionaries is important in gesture and skill classification.

### 2011

### Sparse Manifold Clustering and Embedding

Sparse Manifold Clustering and Embedding, E. Elhamifar and R. Vidal, Neural Information Processing Systems (NIPS), 2011.

Abstract

We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.

### Robust Classification using Structured Sparse Representation

Robust Classification using Structured Sparse Representation, E. Elhamifar and R. Vidal, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

Abstract

In many problems in computer vision, data in multiple classes lie in multiple low-dimensional subspaces of a highdimensional ambient space. However, most of the existing classification methods do not explicitly take this structure into account. In this paper, we consider the problem of classification in the multi-subspace setting using sparse representation techniques. We exploit the fact that the dictionary of all the training data has a block structure where the training data in each class form few blocks of the dictionary. We cast the classification as a structured sparse recovery problem where our goal is to find a representation of a test example that uses the minimum number of blocks from the dictionary. We formulate this problem using two different classes of non-convex optimization programs. We propose convex relaxations for these two non-convex programs and study conditions under which the relaxations are equivalent to the original problems. In addition, we show that the proposed optimization programs can be modified properly to also deal with corrupted data. To evaluate the proposed algorithms, we consider the problem of automatic face recognition. We show that casting the face recognition problem as a structured sparse recovery problem can improve the results of the state-of-the-art face recognition algorithms, especially when we have relatively small number of training data for each class. In particular, we show that the new class of convex programs can improve the state-ofthe-art face recognition results by 10% with only 25% of the training data. In addition, we show that the algorithms are robust to occlusion, corruption, and disguise.

### 2009

### Sparse Subspace Clustering

Sparse Subspace Clustering, E. Elhamifar and R. Vidal, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

Abstract

We propose a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all other data points. In general, finding such a SR is NP hard. Our key contribution is to show that, under mild assumptions, the SR can be obtained ’exactly’ by using !1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this SR. Our method can handle noise, outliers as well as missing data. We apply our subspace clustering algorithm to the problem of segmenting multiple motions in video. Experiments on 167 video sequences show that our approach significantly outperforms state-of-the-art methods.

## CCIS PhD Students & Post-Docs

PhD Students

### Clara De Paolis Kaluza

PhD Student

PhD Students

### Mahdi Khodayar

PhD Student