ITW 2015

THE HIDDEN SUBMATRIX PROBLEM

Andrea Montanari, Stanford Univ., EE+Statistics Depts.

Given a large data matrix A, we consider the problem of determining whether its entries are independent and identically distributed, or it contains submatrices (subsets of row and columns) whose entries have a distribution that differs from the others.
This problem arises in a large number of applications, ranging from network analysis (finding hidden structures/communities in networks), to genomics (finding differentially expressed genes in microarray experiments), to signal processing (finding low dimensional and sparse representations of high-dimensional signals).
I will argue that our understanding of these problems is still in its infancy, by describing a large number of fascinating open problems, and a small number of solved ones.