# Algorithms & Theory

Algorithms and theory research at CCIS seeks to answer one of the most basic question faced in computer science: Which problems can computers solve efficiently? CCIS aims to find out with rigorous, state-of-the-art research that tackles fundamental algorithm problems with solutions that improve Internet security, data privacy, and network efficiency.

### Areas of investigation:

- Approximation algorithms
- Computational complexity
- Cryptography
- Distributed computing
- Privacy
- Learning theory
- Network algorithms

The CCIS algorithms and theory group is well known for research in cryptography, privacy network optimization, and pseudorandomness. Our faculty works in both the core foundations of computing and in computing applications—making for a broad research environment. Students and faculty collaborate closely, forming an especially cohesive community that offers internal seminars, reading groups, recruiting meetings, and social events.

### Notable achievements:

- CCIS has been awarded a National Science Foundation grant to design algorithms to more efficiently and reliably identify bugs in software and hardware systems such as cell phones, medical devices, and aviation systems.
- Our researchers constructed robust, efficient cryptographic protocols and pseudo-random generators.
- CCIS researchers developed widely used ranking and search algorithms for online documents.
- Our team created protocols that enable individual network nodes to cooperate and optimize for global objectives in adversarial environments.
- CCIS researchers developed an algorithm for a model explaining the emergence of scale-free networks in natural, technological, and social systems.

## Research Labs and Groups

## Related News

## People

Tenured and Tenure Track Faculty

### Agnes H. Chan

Professor

Executive Director of Cyber Security Programs

Tenured and Tenure Track Faculty

### Ehsan Elhamifar

Assistant Professor

Tenured and Tenure Track Faculty

### Larry Finkelstein

Professor Emeritus

Dean Emeritus

Tenured and Tenure Track Faculty

### Huy Lê Nguyen

Assistant Professor

Tenured and Tenure Track Faculty

### Misha Pavel

Professor of the Practice

Interdisciplinary with Bouvé College of Health Sciences

Administrative Staff Tenured and Tenure Track Faculty

### Rajmohan Rajaraman

Professor

Associate Dean & Director of the Graduate School

Tenured and Tenure Track Faculty

### Jonathan Ullman

Assistant Professor

Tenured and Tenure Track Faculty

### Emanuele Viola

Associate Professor

Tenured and Tenure Track Faculty

### Daniel Wichs

Assistant Professor

Research Faculty and Staff

### Carlos Gershenson

Visiting Research Professor

PhD Students

### Hridam Basu

PhD Student

PhD Students

### Albert Cheu

PhD Student

PhD Students

### Matthew Dippel

PhD Student

PhD Students

### Saber Shokat Fadaee

PhD Student

PhD Students

### Dan Feng

PhD Student

PhD Students

### Zahra Jafargholi

PhD Student

PhD Students

### Hamidreza Jahanjou

PhD Student

PhD Students

### Chin Ho Lee

PhD Student

PhD Students

### Mehraneh Liaee

PhD Student

PhD Students

### Biswaroop Maiti

PhD Student

PhD Students

### Tanay Mehta

PhD Student

PhD Students

### Bochao Shen

PhD Student

PhD Students

### Georgios Zirdelis

PhD Student

### Alessandra Scafuro

Postdoctoral Research Associate

## Recent Publications

### Dissimilarity-based Sparse Subset Selection

Dissimilarity-based Sparse Subset Selection, E. Elhamifar, G. Sapiro and S. Sastry, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.

Finding an informative subset of a large number of data points or models is at the center of many problems in machine learning, computer vision, bio/health informatics and image/signal processing. Given pairwise dissimilarities between the elements of a `source set’ and a `target set,’ we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, an NP-hard problem, we consider a convex relaxation. The solution of our proposed optimization program finds the representatives and the probability that each element of the target set is associated with the representatives. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, the solution of our proposed optimization program finds representatives from all groups and reveals clustering of the sets. In addition, we show that our proposed formulation can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our proposed algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational cost. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.

### Robust Subspace Clustering

Robust Subspace Clustering, M. Soltanolkotabi, E. Elhamifar and E. J. Candes, Annals of Statistics, 2014.

Abstract

Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [18] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.

### Sparse Subspace Clustering: Algorithm, Theory, and Applications

Sparse Subspace Clustering: Algorithm, Theory, and Applications, E. Elhamifar and R. Vidal, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2013.

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

### Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery, E. Elhamifar, G. Sapiro and R. Vidal, Neural Information Processing Systems (NIPS), 2012.

Abstract

Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can effi- ciently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data.

### Block-Sparse Recovery via Convex Optimization

Block-Sparse Recovery via Convex Optimization, E. Elhamifar and R. Vidal, IEEE Transactions on Signal Processing (TSP), 2012.

Given a dictionary that consists of multiple blocks and a signal that lives in the range space of only a few blocks, we study the problem of finding a block-sparse representation of the signal, i.e., a representation that uses the minimum number of blocks. Motivated by signal/image processing and computer vision applications, such as face recognition, we consider the block-sparse recovery problem in the case where the number of atoms in each block is arbitrary, possibly much larger than the dimension of the underlying subspace. To find a block-sparse representation of a signal, we propose two classes of non-convex optimization programs, which aim to minimize the number of nonzero coefficient blocks and the number of nonzero reconstructed vectors from the blocks, respectively. Since both classes of problems are NP-hard, we propose convex relaxations and derive conditions under which each class of the convex programs is equivalent to the original non-convex formulation. Our conditions depend on the notions of mutual and cumulative subspace coherence of a dictionary, which are natural generalizations of existing notions of mutual and cumulative coherence. We evaluate the performance of the proposed convex programs through simulations as well as real experiments on face recognition. We show that treating the face recognition problem as a block-sparse recovery problem improves the state-of-the-art results by 10% with only 25% of the training data.

### Sparse Manifold Clustering and Embedding

Sparse Manifold Clustering and Embedding, E. Elhamifar and R. Vidal, Neural Information Processing Systems (NIPS), 2011.

Abstract

We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.

## Select Projects

### Network Algorithms Under Adversarial and Stochastic Uncertainty

#### Lead PI

#### Funding

### Network Algorithms Under Adversarial and Stochastic Uncertainty

This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.

Modern information networks are composed of heterogeneous nodes and links, whose capacities and capabilities change unexpectedly due to mobility, failures, maintenance, and adversarial attacks. User demands and critical infrastructure needs, however, require that basic primitives including access to information and services be always efficient and reliable. This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.

The focus of this project will be on two problem domains, which together address adversarial network dynamics and stochastic network failures. The first component is a comprehensive theory of information spreading in dynamic networks. The PI will develop an algorithmic toolkit for dynamic networks, including local gossip-style protocols, network coding, random walks, and other diffusion processes. The second component of the project concerns failure-aware network algorithms that provide high availability in the presence of unexpected and correlated failures. The PI will study failure-aware placement of critical resources, and develop flow and cut algorithms under stochastic failures using techniques from chance-constrained optimization. Algorithms tolerant to adversarial and stochastic uncertainty will play a critical role in large-scale heterogeneous information networks of the future. Broader impacts include student training and curriculum development.

### The Control Principles of Complex Systems

#### Lead PI

#### Co PI

#### Funding

### The Control Principles of Complex Systems

Many of the truly difficult problems limiting advances in contemporary science are rooted in our limited understanding of how complex systems are controlled. Indeed, in human cells millions of molecules are embedded in a complex genetic network that lacks an obvious controller; in society billions of individuals interact with each other through intricate trust-family-friendship-professional-association based networks apparently controlled by no one; economic change is driven by what economists call the “invisible hand of the market”, reflecting a lack of understanding of the control principles that govern the interactions between individuals, companies, banks and regulatory agencies.

These and many other examples raise several fundamental questions: What are the control principles of complex systems? How do complex systems organize themselves to achieve sufficient control to ensure functionality? This proposal is motivated by the hypothesis that the architecture of many complex systems is driven by the system’s need to achieve sufficient control to maintain its basic functions. Hence uncovering the control principles of complex self-organized systems can help us understand the fundamental laws that govern them.

### The Integrative Genomics of Acute Asthma Control

#### Lead PI

#### Co PI

Benjamin Alexander Raby

Frank D. Gilliland

#### Funding

### The Integrative Genomics of Acute Asthma Control

Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control.

The over-arching hypothesis of this proposal is that inter-individual differences in asthma control result from the complex interplay of both environmental, genomic, and socioeconomic factors organized in discrete, scale-free molecular networks. Though strict patient compliance with asthma controller therapy and avoidance of environmental triggers are important strategies for the prevention of asthma exacerbation, failure to maintain control is the most common health-related cause of lost school and workdays. Therefore, better understanding of the molecular underpinnings and the role of environmental factors that lead to poor asthma control is needed. Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control. This proposal consists three Specific Aims, each consisting of three investigational phases: (i) an initial computational discovery phase to define specific molecular networks using the Asthma BRIDGE datasets, followed by two validation phases – (ii) a computational validation phase using an independent clinical cohort, and (iii) an experimental phase to validate critical molecular edges (gene-gene interactions) that emerge from the defined molecular network.

In Specific Aim 1, we will use the Asthma BRIDGE datasets to define interactome sub-module perturbed in poor asthma control;the regulatory variants that modulate this asthma-control module;and to develop a predictive model of asthma control.

In Specific Aim 2, we will study the effects exposure to air pollution and environmental tobacco smoke on modulating the asthma control networks, testing for environment-dependent alterations in network dynamics.

In Specific Aim 3, we will study the impact of inhaled corticosteroids (ICS – the most efficacious asthma-controller medication) on network dynamics of the asthma-control sub-module by comparing network topologies of acute asthma control between subjects taking ICS to those not on ICS. For our experimental validations, we will assess relevant gene-gene interactions by shRNA studies bronchial epithelial and Jurkat T- cell lines. Experimental validations of findings from Aim 2 will be performed by co-treating cells with either cigarette smoke extract (CSE) or ozone. Similar studies will be performed with co-treatment using dexamethasone to validate findings from Aim 2. From the totality of these studies, we will gain new insights into the pathobiology of poor asthma control, and define targets for biomarker development and therapeutic targeting.

Public Health Relevance

Failure to maintain tight asthma symptom control is a major health-related cause of lost school and workdays. This project aims to use novel statistical network-modeling approaches to model the molecular basis of poor asthma control in a well-characterized cohort of asthmatic patients with available genetic, gene expression, and DNA methylation data. Using this data, we will define an asthma-control gene network, and the genetic, epigenetic, and environmental factors that determine inter-individual differences in asthma control.