# Algorithms & Theory

Algorithms and theory research at CCIS seeks to answer one of the most basic question faced in computer science: Which problems can computers solve efficiently? CCIS aims to find out with rigorous, state-of-the-art research that tackles fundamental algorithm problems with solutions that improve Internet security, data privacy, and network efficiency.

### Areas of investigation:

- Approximation algorithms
- Computational complexity
- Cryptography
- Distributed computing
- Privacy
- Learning theory
- Network algorithms

The CCIS algorithms and theory group is well known for research in cryptography, privacy network optimization, and pseudorandomness. Our faculty works in both the core foundations of computing and in computing applications—making for a broad research environment. Students and faculty collaborate closely, forming an especially cohesive community that offers internal seminars, reading groups, recruiting meetings, and social events.

### Notable achievements:

- CCIS has been awarded a National Science Foundation grant to design algorithms to more efficiently and reliably identify bugs in software and hardware systems such as cell phones, medical devices, and aviation systems.
- Our researchers constructed robust, efficient cryptographic protocols and pseudo-random generators.
- CCIS researchers developed widely used ranking and search algorithms for online documents.
- Our team created protocols that enable individual network nodes to cooperate and optimize for global objectives in adversarial environments.
- CCIS researchers developed an algorithm for a model explaining the emergence of scale-free networks in natural, technological, and social systems.

## Research Labs and Groups

## Related News

## People

Tenured and Tenure Track Faculty

### Agnes H. Chan

Faculty Emeritus

Tenured and Tenure Track Faculty

### Ehsan Elhamifar

Assistant Professor

Tenured and Tenure Track Faculty

### Wolfgang Gatterbauer

Associate Professor

Tenured and Tenure Track Faculty

### Huy Lê Nguyen

Assistant Professor

Tenured and Tenure Track Faculty

### Jonathan Ullman

Assistant Professor

Tenured and Tenure Track Faculty

### Emanuele Viola

Associate Professor

Tenured and Tenure Track Faculty

### Daniel Wichs

Associate Professor

Professors of the Practice

### Misha Pavel

Professor of the Practice

Interdisciplinary with Bouvé College of Health Sciences

Administrative Staff Full Time Faculty Teaching Faculty

### Nate Derbinsky

Associate Teaching Professor

Director of Teaching Faculty

Full Time Faculty Teaching Faculty

### John Rachlin

Assistant Teaching Professor

Part Time Faculty Teaching Faculty

### Anurag Bhardwaj

Part-Time Lecturer - Silicon Valley

Part Time Faculty Teaching Faculty

### Rukmini Vijaykumar

Part-Time Lecturer

Post Docs

### Ran Cohen

Postdoctoral Research Associate

Post Docs

### Siyao Guo

Postdoctoral Research Fellow

PhD Students

### Maryam Aziz

PhD Student

PhD Students

### Hridam Basu

PhD Student

PhD Students

### Albert Cheu

PhD Student

PhD Students

### Matthew Dippel

PhD Student

PhD Students

### Xuangui Huang

PhD Student

PhD Students

### Dat Huynh

PhD Student

PhD Students

### Hamidreza Jahanjou

PhD Student

PhD Students

### Lucianna Kiffer

PhD Student

PhD Students

### Eysa Lee

PhD Student

PhD Students

### Chin Ho Lee

PhD Student

PhD Students

### Mehraneh Liaee

PhD Student

PhD Students

### Biswaroop Maiti

PhD Student

PhD Students

### Tanay Mehta

PhD Student

PhD Students

### Vikrant Singhal

PhD Student

PhD Students

### Niklas Smedemark-Margulies

PhD Student

PhD Students

### David Stalfa

PhD Student

PhD Students

### Akshar Varma

PhD Student

PhD Students

### Lydia Zakynthinou

PhD Student

PhD Students

### Giorgos Zirdelis

PhD Student

## Recent Publications

### Computational Business Analytics

Computational Business Analytics, Chapman and Hall/CRC; 1 edition 2014

## Abstract

Computational Business Analytics presents tools and techniques for descriptive, predictive, and prescriptive analytics applicable across multiple domains. Through many examples and challenging case studies from a variety of fields, practitioners easily see the connections to their own problems and can then formulate their own solution strategies.

The book first covers core descriptive and inferential statistics for analytics. The author then enhances numerical statistical techniques with symbolic artificial intelligence (AI) and machine learning (ML) techniques for richer predictive and prescriptive analytics. With a special emphasis on methods that handle time and textual data, the text:

- Enriches principal component and factor analyses with subspace methods, such as latent semantic analyses
- Combines regression analyses with probabilistic graphical modeling, such as Bayesian networks
- Extends autoregression and survival analysis techniques with the Kalman filter, hidden Markov models, and dynamic Bayesian networks
- Embeds decision trees within influence diagrams
- Augments nearest-neighbor and k-means clustering techniques with support vector machines and neural networks

These approaches are not replacements of traditional statistics-based analytics; rather, in most cases, a generalized technique can be reduced to the underlying traditional base technique under very restrictive conditions. The book shows how these enriched techniques offer efficient solutions in areas, including customer segmentation, churn prediction, credit risk assessment, fraud detection, and advertising campaigns.

### Local Differential Privacy for Physical Sensor Data and Sparse Recovery

Anna Gilbert and Audra McMillan. Local Differential Privacy for Physical Sensor Data and Sparse Recovery. In CISS, 2018.

## Abstract

In this work we explore the utility of locally differentially private thermal sensor data. We design a locally differentially private recovery algorithm for the 1-dimensional, discrete heat source location problem and analyse its performance in terms of the Earth Mover Distance error. Our work indicates that it is possible to produce locally private sensor measurements that both keep the exact locations of the heat sources private and permit recovery of the “general geographic vicinity” of the sources. We also discuss the relationship between the property of an inverse problem being ill-conditioned and the amount of noise needed to maintain privacy.

### When is non-trivial estimation possible for graphons and stochastic block models?

Audra McMillan and Adam Smith. When is Nontrivial Estimation Possible for Graphons and Stochastic Block Models? Information and Inference, 2017.

Block graphons (also called stochastic block models) are an important and widely studied class of models for random networks. We provide a lower bound on the accuracy of estimators for block graphons with a large number of blocks. We show that, given only the number k of blocks and an upper bound ρ on the values (connection probabilities) of the graphon, every estimator incurs error Ω(min(ρ,ρk2n2−−−−√)) in the δ2 metric with constant probability for at least some graphons. In particular, our bound rules out any non-trivial estimation (that is, with δ2 error substantially less than ρ ) when k≥nρ−−√ . Combined with previous upper and lower bounds, our results characterize, up to logarithmic terms, the accuracy of graphon estimation in the δ2 metric. A similar lower bound to ours was obtained independently by Klopp et al.

### Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies

Peng Y, Jiang Y, Radivojac P. Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies. Bioinformatics (2018) 34(13): i313-i322.

## Abstract

RESULTS:We propose an algorithm for enumerating consistent sub-graphs of DAGs. The algorithm recursively partitions the graph into strictly smaller graphs until the resulting graph becomes a rooted tree (forest), for which a linear-time solution is computed. It then combines the tallies from graphs created in the recursion to obtain the final count. We prove the correctness of this algorithm, propose several practical accelerations, evaluate it on random graphs and then apply it to characterize four major biomedical ontologies. We believe this work provides valuable insights into the complexity of concept annotation spaces and its potential influence on the predictability of ontological annotation.

### Is there an Oblivious RAM Lower Bound for Online Reads?

Mor Weiss, Daniel Wichs: Is there an Oblivious RAM Lower Bound for Online Reads? IACR Cryptology ePrint Archive 2018: 619 (2018)

## Abstract:

Oblivious RAM (ORAM), introduced by Goldreich and Ostrovsky (JACM 1996), can be used to read and write to memory in a way that hides which locations are being accessed. The best known ORAM schemes have an O(log n) overhead per access, where n is the data size. The work of Goldreich and Ostrovsky gave a lower bound showing that this is optimal for ORAM schemes that operate in a “balls and bins” model, where memory blocks can only be shuffled between different locations but not manipulated otherwise. The lower bound even extends to weaker settings such as offline ORAM, where all of the accesses to be performed need to be specified ahead of time, and read-only ORAM, which only allows reads but not writes. But can we get lower bounds for general ORAM, beyond “balls and bins”? The work of Boyle and Naor (ITCS ’16) shows that this is unlikely in the offline setting. In particular, they construct an offline ORAM with o(log n) overhead assuming the existence of small sorting circuits. Although we do not have instantiations of the latter, ruling them out would require proving new circuit lower bounds. On the other hand, the recent work of Larsen and Nielsen (CRYPTO ’18) shows that there indeed is an Ω(log n) lower bound for general online ORAM. This still leaves the question open for online read-only ORAM or for read/write ORAM where we want very small overhead for the read operations. In this work, we show that a lower bound in these settings is also unlikely. In particular, our main result is a construction of online ORAM where reads (but not writes) have an o(log n) overhead, assuming the existence of small sorting circuits as well as very good locally decodable codes (LDCs). Although we do not have instantiations of either of these with the required parameters, ruling them out is beyond current lower bounds.

### Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence. Maryam Aziz, Jesse Anderton, Emilie Kaufmann and Javed Aslam. ALT 2018.

## Abstract

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our log^2(1/delta) dependence is inescapable for “two-phase” (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.

## Select Projects

### Network Algorithms Under Adversarial and Stochastic Uncertainty

#### Lead PI

#### Funding

### Network Algorithms Under Adversarial and Stochastic Uncertainty

This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.

Modern information networks are composed of heterogeneous nodes and links, whose capacities and capabilities change unexpectedly due to mobility, failures, maintenance, and adversarial attacks. User demands and critical infrastructure needs, however, require that basic primitives including access to information and services be always efficient and reliable. This project studies the design of highly robust networked systems that are resilient to extreme failures and rapid dynamics, and provide optimal performance under a wide spectrum of scenarios with varying levels of predictability.

The focus of this project will be on two problem domains, which together address adversarial network dynamics and stochastic network failures. The first component is a comprehensive theory of information spreading in dynamic networks. The PI will develop an algorithmic toolkit for dynamic networks, including local gossip-style protocols, network coding, random walks, and other diffusion processes. The second component of the project concerns failure-aware network algorithms that provide high availability in the presence of unexpected and correlated failures. The PI will study failure-aware placement of critical resources, and develop flow and cut algorithms under stochastic failures using techniques from chance-constrained optimization. Algorithms tolerant to adversarial and stochastic uncertainty will play a critical role in large-scale heterogeneous information networks of the future. Broader impacts include student training and curriculum development.

### The Control Principles of Complex Systems

#### Lead PI

#### Co PI

#### Funding

### The Control Principles of Complex Systems

Many of the truly difficult problems limiting advances in contemporary science are rooted in our limited understanding of how complex systems are controlled. Indeed, in human cells millions of molecules are embedded in a complex genetic network that lacks an obvious controller; in society billions of individuals interact with each other through intricate trust-family-friendship-professional-association based networks apparently controlled by no one; economic change is driven by what economists call the “invisible hand of the market”, reflecting a lack of understanding of the control principles that govern the interactions between individuals, companies, banks and regulatory agencies.

These and many other examples raise several fundamental questions: What are the control principles of complex systems? How do complex systems organize themselves to achieve sufficient control to ensure functionality? This proposal is motivated by the hypothesis that the architecture of many complex systems is driven by the system’s need to achieve sufficient control to maintain its basic functions. Hence uncovering the control principles of complex self-organized systems can help us understand the fundamental laws that govern them.

### The Integrative Genomics of Acute Asthma Control

#### Lead PI

#### Co PI

Benjamin Alexander Raby

Frank D. Gilliland

#### Funding

### The Integrative Genomics of Acute Asthma Control

Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control.

The over-arching hypothesis of this proposal is that inter-individual differences in asthma control result from the complex interplay of both environmental, genomic, and socioeconomic factors organized in discrete, scale-free molecular networks. Though strict patient compliance with asthma controller therapy and avoidance of environmental triggers are important strategies for the prevention of asthma exacerbation, failure to maintain control is the most common health-related cause of lost school and workdays. Therefore, better understanding of the molecular underpinnings and the role of environmental factors that lead to poor asthma control is needed. Using the Asthma BioRepository for Integrative Genomic Exploration (Asthma BRIDGE), we will perform a series of systems-level genomic analyses that integrate clinical, environmental and various forms of “omic” data (genetics, genomics, and epigenetics) to better understand how molecular processes interact with critical environmental factors to impair asthma control. This proposal consists three Specific Aims, each consisting of three investigational phases: (i) an initial computational discovery phase to define specific molecular networks using the Asthma BRIDGE datasets, followed by two validation phases – (ii) a computational validation phase using an independent clinical cohort, and (iii) an experimental phase to validate critical molecular edges (gene-gene interactions) that emerge from the defined molecular network.

In Specific Aim 1, we will use the Asthma BRIDGE datasets to define interactome sub-module perturbed in poor asthma control;the regulatory variants that modulate this asthma-control module;and to develop a predictive model of asthma control.

In Specific Aim 2, we will study the effects exposure to air pollution and environmental tobacco smoke on modulating the asthma control networks, testing for environment-dependent alterations in network dynamics.

In Specific Aim 3, we will study the impact of inhaled corticosteroids (ICS – the most efficacious asthma-controller medication) on network dynamics of the asthma-control sub-module by comparing network topologies of acute asthma control between subjects taking ICS to those not on ICS. For our experimental validations, we will assess relevant gene-gene interactions by shRNA studies bronchial epithelial and Jurkat T- cell lines. Experimental validations of findings from Aim 2 will be performed by co-treating cells with either cigarette smoke extract (CSE) or ozone. Similar studies will be performed with co-treatment using dexamethasone to validate findings from Aim 2. From the totality of these studies, we will gain new insights into the pathobiology of poor asthma control, and define targets for biomarker development and therapeutic targeting.

Public Health Relevance

Failure to maintain tight asthma symptom control is a major health-related cause of lost school and workdays. This project aims to use novel statistical network-modeling approaches to model the molecular basis of poor asthma control in a well-characterized cohort of asthmatic patients with available genetic, gene expression, and DNA methylation data. Using this data, we will define an asthma-control gene network, and the genetic, epigenetic, and environmental factors that determine inter-individual differences in asthma control.