Big Self-Supervised Models are Strong Semi-Supervised Learners — Paper Summary

Big Self-Supervised Models are Strong Semi-Supervised Learners — Paper Summary
Photo by Dan Dimmock on Unsplash

Introduction

Learning from just a few labeled examples while making the best use of a large amount of unlabeled data is a long-standing problem in machine learning. SimCLRV2 might solve this problem.

What’s New

Team Lead by Ting Chen and colleagues from Google Research proposed SimCLRv2(a modification of SimCLRv1). It is an approach that leverages unlabeled data in a task-agnostic way during pre-training and fine-tuning the supervised labels.

Key Insight

The “unsupervised pre-train, supervised fine-tune” paradigm has been widely used in natural language processing, where one first trains a large language model on unlabeled data, and then fine-tunes the model on a few labeled examples. Though it received less attention in computer vision, a common alternative approach leverages unlabeled data during supervised learning, as a form of regularization. Motivated by these approaches, SimCLRv2 presents an investigation of the “unsupervised pre-train, supervised fine-tune” paradigm for semi-supervised learning on ImageNet.

How it works

The researchers evaluated the model in the popular ImageNet dataset. This approach’s method can be summarized in three main steps: pre-train(unsupervised), fine-tune (supervised), and then distillation with unlabeled data.
● First, the unlabeled data is used task-agnostic for learning general (visual) representations via unsupervised pre-training.
● The general representations are then adapted for a specific task via supervised fine-tuning.
● After that, the unlabeled data is used the second time to further improve predictive performance and obtain a compact model. To this end, a student network is trained on the unlabeled data with imputed labels from the fine-tuned teacher network.

Results

This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels (which is less than 13 labeled images per class) using ResNet-50, a 10x improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with this method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.
Why it matters: The findings described in SimCLRv2 can potentially be harnessed to improve accuracy in any computer vision application where it is more expensive or challenging to label additional data than to train a larger model.


We’re thinking

There is an entire industry built around human labeling services. Improvement in the proposed approach could lead to a short-term loss of income for some of those currently employed or contracted to provide labels

Anish Shrestha

Anish Shrestha

I'm a certified TensorFlow developer and a software engineer specializing in building ai-based solutions, web applications, and everything in between.