GitXplorerGitXplorer
f

simple-ijepa

public
1 stars
1 forks
0 issues

Commits

List of commits on branch main.
Unverified
c0aacb7faee4b74da6a86e92e8eee812458b023d

Fix minor issues and clean-up code.

ffilipbasara0 committed 5 months ago
Unverified
438260d2c237ee2abbe8272efae2574ff07ce231

Update README and add eval notebook

ffilipbasara0 committed 5 months ago
Unverified
03c5465f55cae0d873450edbb849e193ed82a345

Initial commit

ffilipbasara0 committed 5 months ago
Verified
377d02b2481e5cdb99be020d547e1635883302e6

Initial commit

ffilipbasara0 committed 5 months ago

README

The README file for this repository.

Simple I-JEPA

A simple and efficient PyTorch implementation of Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA).

Results

The model was pre-trained on 100,000 unlabeled images from the STL-10 dataset. For evaluation, I trained and tested logistic regression on frozen features obtained from 5k train images and evaluated on 8k test images, also from the STL-10 dataset.

Linear probing was used for evaluating on features extracted from encoders using the scikit LogisticRegression model. Image resolution was 96x96.

More detailed evaluation steps and results for STL10 can be found in the notebooks directory.

Dataset Approach Encoder Emb. dim Patch size Num. targets Batch size Epochs Top 1%
STL10 I-JEPA VisionTransformer 512 8 4 256 100 77.07

All experiments were done using a very small and shallow VisionTransformer (only 11M params) with following parameters:

  • embbeding dimension - 512
  • depth (number of transformers layers) - 6
  • number of heads - 6
  • mlp dim - 2 * embedding dimension
  • patch size - 8
  • number of targets - 4

The mask generator is inspired by the original paper, but sligthly simplified.

Usage

Instalation

To setup the code, clone the repository, optionally create a venv and install requirements:

  1. git clone git@github.com:filipbasara0/simple-ijepa.git
  2. create virtual environment: virtualenv -p python3.10 env
  3. activate virtual environment: source env/bin/activate
  4. install requirements: pip install .

Examples

STL-10 model was trained with this command:

python run_training.py --fp16_precision --log_every_n_steps 200 --num_epochs 100 --batch_size 256

Detailed options

Once the code is setup, run the following command with optinos listed below: python run_training.py [args...]⬇️

I-JEPA

options:
  -h, --help            show this help message and exit
  --dataset_path DATASET_PATH
                        Path where datasets will be saved
  --dataset_name {stl10}
                        Dataset name
  -save_model_dir SAVE_MODEL_DIR
                        Path where models
  --num_epochs NUM_EPOCHS
                        Number of epochs for training
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        Batch size
  -lr LEARNING_RATE, --learning_rate LEARNING_RATE
  -wd WEIGHT_DECAY, --weight_decay WEIGHT_DECAY
  --fp16_precision      Whether to use 16-bit precision for GPU training
  --emb_dim EMB_DIM     Transofmer embedding dimm
  --log_every_n_steps LOG_EVERY_N_STEPS
                        Log every n steps
  --gamma GAMMA         Initial EMA coefficient
  --update_gamma_after_step UPDATE_GAMMA_AFTER_STEP
                        Update EMA gamma after this step
  --update_gamma_every_n_steps UPDATE_GAMMA_EVERY_N_STEPS
                        Update EMA gamma after this many steps
  --ckpt_path CKPT_PATH
                        Specify path to ijepa_model.pth to resume training

Citation

@misc{assran2023selfsupervisedlearningimagesjointembedding,
      title={Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture}, 
      author={Mahmoud Assran and Quentin Duval and Ishan Misra and Piotr Bojanowski and Pascal Vincent and Michael Rabbat and Yann LeCun and Nicolas Ballas},
      year={2023},
      eprint={2301.08243},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2301.08243}, 
}