Pytorch Image Classification. This is just toy example for studying. Most code is ported from timm.
requirements: torch, torchvision, gdown
How to run this?
torchrun --nproc_per_node=2 train.py imageNet --cuda 7,8
python3 train.py imageNet --cuda 7
torchrun --nproc_per_node=2 multi_train.py tiny_a_25 tiny_a_50 cifar100_a_25 cifar100_a_50 cifar10_a_25 cifar10_a_50 -m resnext50_32_4 resnext101_32_4 -c 7,8 -o log/resnext
What does each variable mean?
- nproc_per_node - control parallelism
- cuda - control which gpu device to use
torchrun --nproc_per_node=2 train.py imageNet --interpolation bicubic --lr 1e-3 --epoch 50 --warmup-lr 1e-6 -j 8 --pin-memory --amp --channels-last --cuda 7,8 --who hankyul --exp-target model_name
- You should choose experiment_target or set experiment name manually.
- You should specify your name & project name
- resize operation in test time is done like this:
(H, W) -> resize -> (test_size/crop_ptr, test_size/crop_ptr) -> center_crop -> (test_size, test_size)
. It makes me confused at first. But you can set test_image size directly.
- Validation scores are not same w/ w/o distributed mode. This is because
len(dataset) % (batch_size * ngpu) != 0
. Batch sizes for each gpu's last iter are different and averaging them together results in different validation score. Difference is about0.04%
.