## Contact

## Office Location

440 Huntington Avenue

358 West Village H

Boston, MA 02115

## Mailing Address

Northeastern University

ATTN: Huy Nguyen, 202 WVH

360 Huntington Avenue

Boston, MA 02115

## Research Interests

- Algorithmic techniques for massive data sets
- Optimization
- Machine learning

## Education

- PhD in Computer Science, Princeton University
- MEng in Computer Science, MIT
- BS in Computer Science and Mathematics, MIT

## Biography

Huy Lê Nguyen is an Assistant Professor of Computer Science in the College of Computer and Information Science at Northeastern University. Prior to joining Northeastern, he was a Research Assistant Professor at the Toyota Technological Institute in Chicago and before that, a Google Research Fellow at the Simons Institute at University of California, Berkeley. He received his PhD in Computer Science from Princeton University. Professor Nguyen is broadly interested in the design and analysis of algorithms, with an emphasis on algorithmic techniques for massive data sets and machine learning.

## Recent Publications

### Decomposable Submodular Function Minimization: Discrete and Continuous

Alina Ene, Huy L. Nguyen, László A. Végh. Decomposable Submodular Function Minimization: Discrete and Continuous. NIPS 2017: 2874-2884

**Abstract**

This paper investigates connections between discrete and continuous approaches for decomposable submodular function minimization. We provide improved running time estimates for the state-of-the-art continuous algorithms for the problem using combinatorial arguments. We also provide a systematic experimental comparison of the two types of methods, based on a clear distinction between level-0 and level-1 algorithms.

### Approximate near neighbors for general symmetric norms

Alexandr Andoni, Huy L. Nguyen, Aleksandar Nikolov, Ilya P. Razenshteyn, Erik Waingarten. Approximate near neighbors for general symmetric norms. STOC 2017: 902-913

**Abstract**

We show that every *symmetric* normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every *n*, *d* = *n*^{o(1)}, and every *d*-dimensional symmetric norm ||Â·||, there exists a data structure for (loglog*n*)-approximate nearest neighbor search over ||Â·|| for *n*-point datasets achieving *n*^{o(1)} query time and *n*^{1+o(1)} space. The main technical ingredient of the algorithm is a low-distortion embedding of aÂ symmetric norm into a low-dimensional iterated product of top-*k*norms.

We also show that our techniques cannot be extended to *general* norms.

### Heavy Hitters via Cluster-Preserving Clustering

Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, Mikkel Thorup: Heavy Hitters via Cluster-Preserving Clustering. FOCS 2016: 61-70

**Abstract**

_{p}heavy hitters problem with parameter ε, one must maintain a high-dimensional vector x ∈ ℝ

^{n}subject to updates of the form update (i,Δ) causing the change x

_{i}← x

_{i}+ Δ, where i ε[n], Δ ∈ ℝ. Upon receiving a query, the goal is to report every “heavy hitter” i ∈ [n] with |x

_{i}| ≥ ε ∥x∥

_{p}as part of a list L ⊆ [n] of size O(1/ε

^{p}), i.e. proportional to the maximum possible number of heavy hitters. For any pε(0,2] the COUNTSKETCH of [CCFC04] solves ℓ

_{p}heavy hitters using O(ε

^{-p}lg n) words of space with O(lg n) update time, O(n lg n) query time to output L, and whose output after any query is correct with high probability (whp) 1 – 1/poly(n) [JST11, Section 4.4]. This space bound is optimal even in the strict turnstile model [JST11] in which it is promised that x

_{i}≥ 0 for all i ∈ [n] at all points in the stream, but unfortunately the query time is very slow. To remedy this, the work [CM05] proposed the “dyadic trick” for the COUNTMIN sketch for p = 1 in the strict turnstile model, which to maintain whp correctness achieves suboptimal space O(ε

^{-1}lg

^{2}n), worse update time O(lg

^{2}n), but much better query time O(ε

^{-1}poly(lg n)). An extension to all p ∈ (0,2] appears in [KNPW11, Theorem 1], and can be obtained from [Pag13]. We show that this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, EXPANDERSKETCH, which in the most general turnstile model achieves optimal O(ε-plog n) space, O(log n) update time, and fast O(ε-ppoly(log n)) query time, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the “dyadic trick” by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a “cluster-preserving clustering” algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our cluster-preserving clustering may be of broader interest much beyond heavy hitters.

### Constrained Submodular Maximization: Beyond 1/e

Alina Ene, Huy L. Nguyen. Constrained Submodular Maximization: Beyond 1/e. FOCS 2016: 248-257

**Abstract:**

### A New Framework for Distributed Submodular Maximization

Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, Justin Ward. A New Framework for Distributed Submodular Maximization. FOCS 2016: 645-654

**Abstract:**

### Time Lower Bounds for Nonadaptive Turnstile Streaming Algorithms

Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen. Time Lower Bounds for Nonadaptive Turnstile Streaming Algorithms. STOC 2015: 803-812

**Abstract**

We say a turnstile streaming algorithm is {\em non-adaptive} if, during updates, the memory cells written and read depend only on the index being updated and random coins tossed at the beginning of the stream (and not on the memory contents of the algorithm). Memory cells read during *queries* may be decided upon adaptively. All known turnstile streaming algorithms in the literature, except a single recent example for a particular *promise problem* [7], are non-adaptive. In fact, even more specifically, they are all linear sketches. We prove the first non-trivial update time lower bounds for both randomized and deterministic turnstile streaming algorithms, which hold when the algorithms are non-adaptive. While there has been abundant success in proving space lower bounds, there have been no non-trivial turnstile update time lower bounds. Our lower bounds hold against classically studied problems such as heavy hitters, point query, entropy estimation, and moment estimation. In some cases of deterministic algorithms, our lower bounds nearly match known upper bounds.

## Previous Publications

### 2017

### Approximate k-flat Nearest Neighbor Search

Wolfgang Mulzer, Huy L. Nguyen, Paul Seiferth, Yannik Stein. Approximate k-flat Nearest Neighbor Search. STOC 2015: 783-792

**Abstract**

Let k ≥ 0 be an integer. In the *approximate k-flat nearest neighbor* (k-ANN) problem, we are given a set P ⊂ R^{d} of n points in d-dimensional space and a fixed approximation factor c > 1. Our goal is to preprocess P so that we can efficiently answer *approximate k-flat nearest neighbor queries*: given a k-flat F, find a point in P whose distance to F is within a factor c of the distance between F and the closest point in P. The case k = 0 corresponds to the well-studied approximate nearest neighbor problem, for which a plethora of results are known, both in low and high dimensions. The case k = 1 is called *approximate line nearest neighbor*. In this case, we are aware of only one provably efficient data structure, due to Andoni, Indyk, Krauthgamer, and Nguyen (AIKN) [2]. For k ≥ 2, we know of no previous results.

We present the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary k. We use a data structure for 0-ANN-queries as a black box, and the performance depends on the parameters of the 0-ANN solution: suppose we have a 0-ANN structure with query time O(n^{ρ}) and space requirement O(n^{1+σ}), for ρ, σ > 0. Then we can answer k-ANN queries in time O(n^{k/(k + 1 – ρ) + t})$ and space O(n^{1+σ k/(k + 1 – ρ)} + n log^{O(1/t)} n). Here, t > 0 is an arbitrary constant and the O-notation hides exponential factors in k, 1/t, and c and polynomials in d.

Our approach generalizes the techniques of AIKN for 1-ANN: we partition P into *clusters* of increasing radius, and we build a low-dimensional data structure for a random projection of P. Given a query flat F, the query can be answered directly in clusters whose radius is “small” compared to d(F, P) using a grid. For the remaining points, the low dimensional approximation turns out to be precise enough. Our new data structures also give an improvement in the space requirement over the previous result for 1-ANN: we can achieve near-linear space and sublinear query time, a further step towards practical applications where space constitutes the bottleneck.

### 2016

### Communication lower bounds for statistical estimation problems via a distributed data processing inequality

Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. STOC 2016: 1011-1020

**Abstract**

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by message passing and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω(min{n,d}m), where n is the number of observations that each machine receives and d is the ambient dimension. These lower results improve upon Shamir (NIPS’14) and Steinhardt-Duchi (COLT’15) by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

### 2015

### Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions

Alina Ene, Huy L. Nguyen. Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions. ICML 2015: 787-795

#### Abstract

Submodular function minimization is a fundamental optimization problem that arises in several applications in machine learning and computer vision. The problem is known to be solvable in polynomial time, but general purpose algorithms have high running times and are unsuitable for large-scale problems. Recent work have used convex optimization techniques to obtain very practical algorithms for minimizing functions that are sums of “simple” functions. In this paper, we use random coordinate descent methods to obtain algorithms with faster linear convergence rates and cheaper iteration costs. Compared to alternating projection methods, our algorithms do not rely on full-dimensional vector operations and they converge in significantly fewer iterations.

### The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, Justin Ward. The Power of Randomization: Distributed Submodular Maximization on Massive Datasets. ICML 2015: 1236-1244

**Abstract**

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.