The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Kocsis, Peter; Súkeník, Peter; Braso, Guillem; Niessner, Matthias; Leal-Taixé, Laura; Elezi, Ismail

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Peter Kocsis, Peter Súkeník, Guillem Braso, Matthias Niessner, Laura Leal-Taixé, Ismail Elezi

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track

Abstract

Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.

Abstract

Name Change Policy