Nonparametric Identification of Multivariate Mixtures

This article analyzes the identifiability of k-variate, M-component finite mixture models in which each component distribution has independent marginals, including models in latent class analysis. Without making parametric assumptions on the component distributions, we investigate how one can ide… ntify the number of components and the component distributions from the distribution function of the observed data. We reveal an important link between the number of variables (k), the number of values each variable can take, and the number of identifiable components. A lower bound on the number of components (M) is nonparametrically identifiable if k >= 2, and the maximum identifiable number of components is determined by the number of different values each variable takes. When M is known, the mixing proportions and the component distributions are nonparametrically identified from matrices constructed from the distribution function of the data if (i) k >= 3, (ii) two of k variables take at least M different values, and (iii) these matrices satisfy some rank and eigenvalue conditions. For the unknown M case, we propose an algorithm that possibly identifies M and the component distributions from data. We discuss a condition for nonparametric identi fication and its observable implications. In case M cannot be identified, we use our identification condition to develop a procedure that consistently estimates a lower bound on the number of components by estimating the rank of a matrix constructed from the distribution function of observed variables.