In the set cover problem, we are given a collection of $m$ subsets over a universe of $n$ elements, and the goal is to find a sub-collection of sets whose union covers the universe. The set cover problem is a fundamental optimization problem with many applications in computer science and related disciplines. In this talk, we investigate the set cover problem in the streaming model of computation whereby the sets are presented one by one in a stream, and the goal is to solve the set cover problem using a space-efficient algorithm.

We show that to compute an $\alpha$-approximate set cover (for any $\alpha= o(\sqrt{n})$) via a single-pass streaming algorithm, $\Theta(mn/\alpha)$ space is both necessary and sufficient (up to an $O(\log{n})$ factor). We further study the problem of estimating the size of a minimum set cover (as opposed to finding the actual sets), and show that this turns out to be a distinctly easier problem. Specifically, we prove that $\Theta(mn/\alpha^2)$ space is both sufficient and necessary (up to logarithmic factors) for estimating the size of a minimum set cover to within a factor of $\alpha$. Our algorithm in fact works for the more general problem of estimating the optimal value of a covering integer program. These results provide a tight resolution of the space-approximation tradeoff for single-pass streaming algorithms for the set cover problem.

This is joint work with my students Sepehr Assadi and Yang Li.