# Completing the Covariance: Deterministic matrix completion for symmetric positive semi-definite matrices

This week in J. Pillz Happy Hour we discussed the following work:

Deterministic Symmetric Positive Semi-definite Matrix Completion
William E. Bishop and Byron M. Yu, NIPS, 2014

This paper took on the interesting problem of low-rank matrix completion with the twists that (1) the observed entries of the matrix $A$ are both chosen in a deterministic manner and that (2) those entries consist of the union of principal sub-matrices $A_l$ for $= 1,2,... K$. The large body of prior work in this field mostly relied on random sampling, i.e. we get to observe some set of $M$ entries $Y_{i,j} = A_{i,j} + \epsilon$ of $A$ chosen uniformly at random from the set of all entries (i.e. see Candes & Recht, Candes & Plan, etc.). These initial guarantees had the nice property that the recovery guarantees occurred with high probability, meaning that despite the probabilistic nature of the observed entry locations, solving a convex optimization program was virtually guaranteed to recover the original matrix up to a bounded recovery error. Additionally, this recovery error bound correctly deduced the linear dependence on the error in the observations $\epsilon$.  The framework used was also expandable to cases where $M$ linear combinations of  the entries of $A$ were observed, significantly expanding the set of problems the theory was applicable to.

For the case considered by Bishop & Yu, the random sampling framework considered previously was not useful, as there was much more structure in the problem being considered. Taking the motivating example of recovering a covariance matrix, the authors assume that instead of covariances between pairs of variables being observed individually, they instead assume that the covariance matrix between subsets of variables are, in turn, fully observed. They assume that there is no randomness in which variables are observed and instead seek to devise (1) conditions on when the full covariance is recoverable from the observed subset covariances, and (2) an algorithm that take the subset covariances and recover the full covariance. More generally, the authors phrase this as the recovery of symmetric positive semi-definite matrices from principal sub-blocks.

While the conditions are presented first in the paper, the algorithm actually motivated the need for the specific conditions needed. The algorithm devised is simple and elegant, and relies on the fact that a SPSD matrix can be decomposed as $A = CC^T$. Similarly, any principal sub-matrix of the rank-$lattex r$ matrix $A$$A_l$, can be similarly decomposed as $A_l = C_lC_l^T$. Additionally, the $C_l$ matrices can, under a simple rotation, match the matrix $C$ over the indices corresponding to the rows spanned by the sub-matrix $A_l$. These observations lead to the following algorithm; For each block $A_l$, decompose $A_l$ into its eigenvalue decomposition $A_l = \Sigma_l\Lambda_l\Sigma^T_l$ and set $\widehat{C}_l = \Sigma_l\Lambda_l^{1/2}$. These estimates of the $C_l$ sub-matrices will not match on their overlaps, so the must be rotated in order to match. One sub-matrix is chosen as the seed, and all other matrices are, in turn, rotated to match the current set values of $\widehat{C}$ on the overlap. The ordering of which matrices to rotate when is chosen to maximize, at each step, the overlap with all sub-matrices that came before it. The final estimate of $A$ is then just the outer product of the estimate of $C$: $\widehat{A} = \widehat{C}\widehat{C}^T$.

The conditions for this algorithm to work boil down to requiring that (1) The observed sub-matrices are, in fact, principal sub-matrices and that the set of observed entries span all the rows of the matrix and (2) There is “enough” overlap between the principal sub-matrices for the alignment step in the algorithm to work. While (1) is fairly obvious, (2) can be boiled down to needing the index set spanned by any matrix and those that came before it (in the ordering discussed above) to have rank $r$: the same rank as $A$. When these conditions are satisfied, the authors can prove that with no noise their algorithm recovers the true matrix $A$ exactly, and that with noise the algorithm recovers the matrix $A$ up to an error bounded by a function $||A -\widehat{A}||_F^2 \leq K_1\sqrt{r\epsilon} + K_2\epsilon$ where $K_1, K_2$ are fairy involved functions of the eigenvalues of $A$, and $A_l$. Interestingly, they show that without their assumptions, no algorithm can ever recover the matrix $A$ from its principal sub-matrices.

While these results mimic the style of results in the prior literature, the assumptions in this work can be easily checked given the set of observed indices. The authors consider this a benefit when compared to the assumptions in previous matrix completion theory, which cannot be checked when given a set of measurements. Additionally, the authors point out that their assumptions guarantee recovery for all rank-$r$ matrices, indicating that there cannot be an adversarial matrix where the method fails. What the authors sacrifice for these benefits are two-fold. For one, the rank restriction on the overlap  is quite prohibitive, and basically requires that the full complexity of the entire matrix is represented in each overlap. Thus, while there do exist sets of principal sub-matrices that span very few of the entries of $A$ and satisfy the assumptions, these are probably very unlikely sets. Unfortunately, the author’s Theorem 1 specifies that this is an unavoidable aspect of sampling principal sub-matrices. The second detriment is that the proof techniques used for the noise recovery guarantee resulted in an error bond that is sensitive to the specific eigen-spectrum of $A$, opaque in its behavior, and does not capture the true dependence of the perturbation $\epsilon$. The eigenvalue dependence is reminiscent of non-uniform recovery guarantees in the compressive sensing literature (e.g., block-diagonal sensing matrices; Eftekhari, et al.). It would have been nice to have in the paper some discussion of this dependency and if it was a fundamental aspect of the problem or a proof artifact. Overall, however, the simplicity of the algorithm combined with the guarantees make this a very nice paper.

Advertisements