Some new Machine Learning Papers

Congratulations to Zhenlin Xu who got his ICLR paper on robust and generalizable visual representation learning accepted [zx20] . Congratulations also to Chris and Florian who got their works on graph filtration learning [ch20a] , topologically densified distributions [chb20] , and an analysis of the supervised contrastive loss [fg21] accepted at ICML20/21. Lastly, we also published a paper at NeurIPS 2020, which uses an initial value perspective to parameterize deep neural networks [fx20] .

ICLR 2021

zx20

ICML 2020/2021

cha20
chb20
fg21

Dissecting Supervised Constrastive Learning

Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. However, recent works show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective. In this work, we address the question whether there are fundamental differences in the sought-for representation geometry in the output space of the encoder at minimal loss. Specifically, we prove, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere. We provide empirical evidence that this configuration is attained in practice and that reaching a close-to-optimal state typically indicates good generalization performance. Yet, the two losses show remarkably different optimization behavior. The number of iterations required to perfectly fit to data scales superlinearly with the amount of randomly flipped labels for the supervised contrastive loss. This is in contrast to the approximately linear scaling previously reported for networks trained with cross-entropy.

NeurIPS 2020

fx20

A shooting formulation of deep learning

Continuous-depth neural networks can be viewed as deep limits of discrete neural networks whose dynamics resemble a discretization of an ordinary differential equation (ODE). Although important steps have been taken to realize the advantages of such continuous formulations, most current techniques are not truly continuous-depth as they assume identical layers. Indeed, existing works throw into relief the myriad difficulties presented by an infinite-dimensional parameter space in learning a continuous-depth neural ODE. To this end, we introduce a shooting formulation which shifts the perspective from parameterizing a network layer-by-layer to parameterizing over optimal networks described only by a set of initial conditions. For scalability, we propose a novel particle-ensemble parametrization which fully specifies the optimal weight trajectory of the continuous-depth neural network. Our experiments show that our particle-ensemble shooting formulation can achieve competitive performance, especially on long-range forecasting tasks. Finally, though the current work is inspired by continuous-depth neural networks, the particle-ensemble shooting formulation also applies to discrete-time networks and may lead to a new fertile area of research in deep learning parametrization.

Admin
Admin
UNC Biomedical Image Analysis Group (UNC-biag)

UNC Biomedical Image Analysis Group (unc-biag)