GitXplorerGitXplorer
a

finetrainers

public
752 stars
79 forks
40 issues

Commits

List of commits on branch main.
Verified
da9d7d92a5e71aec82b7a1a25d97631458212bd5

[core] fix pipeline loading by waiting till `transformer` is saved. (#226)

ssayakpaul committed 20 hours ago
Verified
9385d7a4dc417020f8e7e3514387b7c4eb1efcb3

[Chore] reset unnecessary args (#225)

ssayakpaul committed a day ago
Verified
f5ecb6e4711ea0f0b27992e36cc7a72dfc12fae1

[docs] add a note on MP. (#224)

ssayakpaul committed 3 days ago
Verified
4271b4e436b6057c34c8cc155b997bb63f65fb00

Restructure model folder (#219)

aa-r-r-o-w committed 4 days ago
Verified
d220aacd601fb009b177c909444089c4c07057e9

(fake*) FP8 training support (#184)

aa-r-r-o-w committed 4 days ago
Verified
f5f9cc02e98ea3c5469559b18404c26830d0dca1

Fix: utils error in michi finetuning script (#218)

gguptaaryan16 committed 4 days ago

README

The README file for this repository.

finetrainers 🧪

cogvideox-factory was renamed to finetrainers. If you're looking to train CogVideoX or Mochi with the legacy training scripts, please refer to this README instead. Everything in the training/ directory will be eventually moved and supported under finetrainers.

FineTrainers is a work-in-progress library to support (accessible) training of video models. Our first priority is to support LoRA training for all popular video models in Diffusers, and eventually other methods like controlnets, control-loras, distillation, etc.

Your browser does not support the video tag.

News

  • 🔥 2024-01-15: Support for naive FP8 weight-casting training added! This allows training HunyuanVideo in under 24 GB upto specific resolutions.
  • 🔥 2024-01-13: Support for T2V full-finetuning added! Thanks to @ArEnSc for taking up the initiative!
  • 🔥 2024-01-03: Support for T2V LoRA finetuning of CogVideoX added!
  • 🔥 2024-12-20: Support for T2V LoRA finetuning of Hunyuan Video added! We would like to thank @SHYuanBest for his work on a training script here.
  • 🔥 2024-12-18: Support for T2V LoRA finetuning of LTX Video added!

Table of Contents

Quickstart

Clone the repository and make sure the requirements are installed: pip install -r requirements.txt and install diffusers from source by pip install git+https://github.com/huggingface/diffusers. The requirements specify diffusers>=0.32.1, but it is always recommended to use the main branch for the latest features and bugfixes.

Then download a dataset:

# install `huggingface_hub`
huggingface-cli download \
  --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset \
  --local-dir video-dataset-disney

Then launch LoRA fine-tuning. Below we provide an example for LTX-Video. We refer the users to docs/training to learn more details.

[!IMPORTANT] It is recommended to use Pytorch 2.5.1 or above for training. Previous versions can lead to completely black videos, OOM errors, or other issues and are not tested.

Training command

TODO: LTX does not do too well with the disney dataset. We will update this to use a better example soon.

#!/bin/bash
export WANDB_MODE="offline"
export NCCL_P2P_DISABLE=1
export TORCH_NCCL_ENABLE_MONITORING=0
export FINETRAINERS_LOG_LEVEL=DEBUG

GPU_IDS="0,1"

DATA_ROOT="/path/to/video-dataset-disney"
CAPTION_COLUMN="prompts.txt"
VIDEO_COLUMN="videos.txt"
OUTPUT_DIR="/path/to/output/directory/ltx-video/ltxv_disney"

ID_TOKEN="BW_STYLE"

# Model arguments
model_cmd="--model_name ltx_video \
  --pretrained_model_name_or_path Lightricks/LTX-Video"

# Dataset arguments
dataset_cmd="--data_root $DATA_ROOT \
  --video_column $VIDEO_COLUMN \
  --caption_column $CAPTION_COLUMN \
  --id_token $ID_TOKEN \
  --video_resolution_buckets 49x512x768 \
  --caption_dropout_p 0.05"

# Dataloader arguments
dataloader_cmd="--dataloader_num_workers 0"

# Diffusion arguments
diffusion_cmd="--flow_weighting_scheme logit_normal"

# Training arguments
training_cmd="--training_type lora \
  --seed 42 \
  --batch_size 1 \
  --train_steps 3000 \
  --rank 128 \
  --lora_alpha 128 \
  --target_modules to_q to_k to_v to_out.0 \
  --gradient_accumulation_steps 4 \
  --gradient_checkpointing \
  --checkpointing_steps 500 \
  --checkpointing_limit 2 \
  --enable_slicing \
  --enable_tiling"

# Optimizer arguments
optimizer_cmd="--optimizer adamw \
  --lr 3e-5 \
  --lr_scheduler constant_with_warmup \
  --lr_warmup_steps 100 \
  --lr_num_cycles 1 \
  --beta1 0.9 \
  --beta2 0.95 \
  --weight_decay 1e-4 \
  --epsilon 1e-8 \
  --max_grad_norm 1.0"

# Miscellaneous arguments
miscellaneous_cmd="--tracker_name finetrainers-ltxv \
  --output_dir $OUTPUT_DIR \
  --nccl_timeout 1800 \
  --report_to wandb"

cmd="accelerate launch --config_file accelerate_configs/uncompiled_2.yaml --gpu_ids $GPU_IDS train.py \
  $model_cmd \
  $dataset_cmd \
  $dataloader_cmd \
  $diffusion_cmd \
  $training_cmd \
  $optimizer_cmd \
  $miscellaneous_cmd"

echo "Running command: $cmd"
eval $cmd
echo -ne "-------------------- Finished executing script --------------------\n\n"

Here we are using two GPUs. But one can do single-GPU training by setting GPU_IDS=0. By default, we are using some simple optimizations to reduce memory consumption (such as gradient checkpointing). Please refer to docs/training/optimizations to learn about the memory optimizations currently supported.

For inference, refer here. For docs related to the other supported model, refer here.

Support Matrix

Model Name Tasks Min. LoRA VRAM* Min. Full Finetuning VRAM^
LTX-Video Text-to-Video 5 GB 21 GB
HunyuanVideo Text-to-Video 32 GB OOM
CogVideoX-5b Text-to-Video 18 GB 53 GB

*Noted for training-only, no validation, at resolution 49x512x768, rank 128, with pre-computation, using FP8 weights & gradient checkpointing. Pre-computation of conditions and latents may require higher limits (but typically under 16 GB).
^Noted for training-only, no validation, at resolution 49x512x768, with pre-computation, using BF16 weights & gradient checkpointing.

If you would like to use a custom dataset, refer to the dataset preparation guide here.

Acknowledgements

  • finetrainers builds on top of a body of great open-source libraries: transformers, accelerate, peft, diffusers, bitsandbytes, torchao, deepspeed -- to name a few.
  • Some of the design choices of finetrainers were inspired by SimpleTuner.