This paper addresses the question of the selection of multivariate GARCH models in terms of variance matrix forecasting accuracy with a particular focus on relatively large scale problems. We consider 10 assets from NYSE and NASDAQ and compare 125 model based one-step-ahead conditional variance f… orecasts over a period of 10 years using the model confidence set (MCS) and the Superior Predicitive Ability (SPA) tests. Model per- formances are evaluated using four statistical loss functions which account for different types and degrees of asymmetry with respect to over/under predictions. When consid- ering the full sample, MCS results are strongly driven by short periods of high market instability during which multivariate GARCH models appear to be inaccurate. Over rel- atively unstable periods, i.e. dot-com bubble, the set of superior models is composed of more sophisticated specifications such as orthogonal and dynamic conditional correlation (DCC), both with leverage effect in the conditional variances. However, unlike the DCC models, our results show that the orthogonal specifications tend to underestimate the conditional variance. Over calm periods, a simple assumption like constant conditional correlation and symmetry in the conditional variances cannot be rejected. Finally, during the 2007-2008 financial crisis, accounting for non-stationarity in the conditional variance process generates superior forecasts. The SPA test suggests that, independently from the period, the best models do not provide significantly better forecasts than the DCC model of Engle (2002) with leverage in the conditional variances of the returns.