EfficientNetV2-pytorch

Unofficial EfficientNetV2 pytorch implementation repository.

It contains:

Simple Implementation of model (here)
Pretrained Model (numpy weight, we upload numpy files converted from official tensorflow checkout point)
Training code (here)
Tutorial (Colab EfficientNetV2-predict tutorial, Colab EfficientNetV2-finetuning tutorial)
Experiment results

Tutorial

Colab Tutorial

How to use model on colab? please check Colab EfficientNetV2-predict tutorial
How to train model on colab? please check Colab EfficientNetV2-finetuning tutorial
See how cutmix, cutout, mixup works in Colab Data augmentation tutorial

If you just want to use pretrained model, load model by torch.hub.load

import torch

model = torch.hub.load('hankyul2/EfficientNetV2-pytorch', 'efficientnet_v2_s', pretrained=True, nclass=1000)
print(model)

Available Model Names: efficientnet_v2_{s|m|l}(ImageNet), efficientnet_v2_{s|m|l}_in21k(ImageNet21k)

If you want to finetuning on cifar, use this repository.

Clone this repo and install dependency

git clone https://github.com/hankyul2/EfficientNetV2-pytorch.git
pip3 install requirements.txt

Train & Test model (see more examples in tmuxp/cifar.yaml)

python3 main.py fit --config config/efficientnetv2_s/cifar10.yaml --trainer.gpus 2,3,

Model Name	Pretrained Dataset	Cifar10	Cifar100
EfficientNetV2-S	ImageNet	98.46 (tf.dev, weight)	90.05 (tf.dev, weight)
EfficientNetV2-M	ImageNet	98.89 (tf.dev, weight)	91.54 (tf.dev, weight)
EfficientNetV2-L	ImageNet	98.80 (tf.dev, weight)	91.88 (tf.dev, weight)
EfficientNetV2-S-in21k	ImageNet21k	98.50 (tf.dev, weight)	90.96 (tf.dev, weight)
EfficientNetV2-M-in21k	ImageNet21k	98.70 (tf.dev, weight)	92.06 (tf.dev, weight)
EfficientNetV2-L-in21k	ImageNet21k	98.78 (tf.dev, weight)	92.08 (tf.dev, weight)
EfficientNetV2-XL-in21k	ImageNet21k	-	-

Note

The results are combination of
- Half precision
- Super Convergence(epoch=20)
- AdamW(weight_decay=0.005)
- EMA(decay=0.999)
- cutmix(prob=1.0)
Changes from original paper (CIFAR)
1. We just run 20 epochs to got above results. If you run more epochs, you can get more higher accuracy.
2. What we changed from original setup are: optimizer(SGD to AdamW), LR scheduler(cosinelr to onecylelr), augmentation(cutout to cutmix), image size (384 to 224), epoch (105 to 20).
3. Important hyper-parameter(most important to least important): LR->weigth_decay->ema-decay->cutmix_prob->epoch.
you can get same results by running tmuxp/cifar.yaml

Cifar setup

Category	Contents
Dataset	CIFAR10 \| CIFAR100
Batch_size per gpu	(s, m, l) = (256, 128, 64)
Train Augmentation	image_size = 224, horizontal flip, random_crop (pad=4), CutMix(prob=1.0)
Test Augmentation	image_size = 224, center_crop
Model	EfficientNetV2 s \| m \| l (pretrained on in1k or in21k)
Regularization	Dropout=0.0, Stochastic_path=0.2, BatchNorm
Optimizer	AdamW(weight_decay=0.005)
Criterion	Label Smoothing (CrossEntropyLoss)
LR Scheduler	LR: (s, m, l) = (0.001, 0.0005, 0.0003), LR scheduler: OneCycle Learning Rate(epoch=20)
GPUs & ETC	16 precision EMA(decay=0.999, 0.9993, 0.9995) S - 2 * 3090 (batch size 512) M - 2 * 3090 (batch size 256) L - 2 * 3090 (batch size 128)

EfficientNetV2