GitXplorerGitXplorer
S

sd3.5

public
898 stars
65 forks
14 issues

Commits

List of commits on branch main.
Unverified
4e6fa967fe40c69a13c309d10a61ad0fe8c60fb7

merge in README updates

bbrianfitzgerald committed 2 months ago
Unverified
56e090da823b907fe98e94906212a48498c18f67

remove attn_mode and xformers

bbrianfitzgerald committed 2 months ago
Unverified
7df9edbf5b7b541d2b6a2296f098001f91c7b04c

Fixes for VAE logic and 2B ControlNets, and speed up model loading by loading ControlNets to CUDA if available

bbrianfitzgerald committed 2 months ago
Unverified
6ae9733dba2a0324eee8deaebffd3c73c4c255ac

minor cleanup

bbrianfitzgerald committed 2 months ago
Unverified
232df687059a8a8902052f50c9327089a9a259d6

more fixes for controlnet

bbrianfitzgerald committed 2 months ago
Unverified
80757763fb35490532f8c46e8bc935d478f8f71a

minor changes to image saving and controlnet loading

bbrianfitzgerald committed 2 months ago

README

The README file for this repository.

Stable Diffusion 3.5

Inference-only tiny reference implementation of SD3.5 and SD3 - everything you need for simple inference using SD3.5/SD3, as well as the SD3.5 Large ControlNets, excluding the weights files.

Contains code for the text encoders (OpenAI CLIP-L/14, OpenCLIP bigG, Google T5-XXL) (these models are all public), the VAE Decoder (similar to previous SD models, but 16-channels and no postquantconv step), and the core MM-DiT (entirely new).

Note: this repo is a reference library meant to assist partner organizations in implementing SD3.5/SD3. For alternate inference, use Comfy.

Updates

  • Nov 26, 2024 : Released ControlNets for SD3.5-Large.
  • Oct 29, 2024 : Released inference code for SD3.5-Medium.
  • Oct 24, 2024 : Updated code license to MIT License.
  • Oct 22, 2024 : Released inference code for SD3.5-Large, Large-Turbo. Also works on SD3-Medium.

Download

Download the following models from HuggingFace into models directory:

  1. Stability AI SD3.5 Large or Stability AI SD3.5 Large Turbo or Stability AI SD3.5 Medium
  2. OpenAI CLIP-L
  3. OpenCLIP bigG
  4. Google T5-XXL

This code also works for Stability AI SD3 Medium.

ControlNets

Optionally, download SD3.5 Large ControlNets:

from huggingface_hub import hf_hub_download
hf_hub_download("stabilityai/stable-diffusion-3.5-controlnets", "sd3.5_large_controlnet_blur.safetensors", local_dir="models")
hf_hub_download("stabilityai/stable-diffusion-3.5-controlnets", "sd3.5_large_controlnet_canny.safetensors", local_dir="models")
hf_hub_download("stabilityai/stable-diffusion-3.5-controlnets", "sd3.5_large_controlnet_depth.safetensors", local_dir="models")

Install

# Note: on windows use "python" not "python3"
python3 -s -m venv .sd3.5
source .sd3.5/bin/activate
# or on windows: venv/scripts/activate
python3 -s -m pip install -r requirements.txt

Run

# Generate a cat using SD3.5 Large model (at models/sd3.5_large.safetensors) with its default settings
python3 sd3_infer.py --prompt "cute wallpaper art of a cat"
# Or use a text file with a list of prompts, using SD3.5 Large
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large.safetensors
# Generate from prompt file using SD3.5 Large Turbo with its default settings
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large_turbo.safetensors
# Generate from prompt file using SD3.5 Medium with its default settings, at 2k resolution
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_medium.safetensors --width 1920 --height 1080
# Generate from prompt file using SD3 Medium with its default settings
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3_medium.safetensors

Images will be output to outputs/<MODEL>/<PROMPT>_<DATETIME>_<POSTFIX> by default. To add a postfix to the output directory, add --postfix <my_postfix>. For example,

python3 sd3_infer.py --prompt path/to/my_prompts.txt --postfix "steps100" --steps 100

To change the resolution of the generated image, add --width <WIDTH> --height <HEIGHT>.

Optionally, use Skip Layer Guidance for potentially better struture and anatomy coherency from SD3.5-Medium.

python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_medium.safetensors --skip_layer_cfg True

ControlNets

To use SD3.5 Large ControlNets, additionally download your chosen ControlNet model from the model repository, then run inference, like so:

  • Blur:
python sd3_infer.py --model models/sd3.5_large.safetensors --controlnet_ckpt models/sd3.5_large_controlnet_blur.safetensors --controlnet_cond_image inputs/blur.png --prompt "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater"
  • Canny:
python sd3_infer.py --model models/sd3.5_large.safetensors --controlnet_ckpt models/sd3.5_large_controlnet_canny.safetensors --controlnet_cond_image inputs/canny.png --prompt "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms"
  • Depth:
python sd3_infer.py --model models/sd3.5_large.safetensors --controlnet_ckpt models/sd3.5_large_controlnet_depth.safetensors --controlnet_cond_image inputs/depth.png --prompt "photo of woman, presumably in her mid-thirties, striking a balanced yoga pose on a rocky outcrop during dusk or dawn. She wears a light gray t-shirt and dark leggings. Her pose is dynamic, with one leg extended backward and the other bent at the knee, holding the moon close to her hand."

For details on preprocessing for each of the ControlNets, and examples, please review the model card.

File Guide

  • sd3_infer.py - entry point, review this for basic usage of diffusion model
  • sd3_impls.py - contains the wrapper around the MMDiTX and the VAE
  • other_impls.py - contains the CLIP models, the T5 model, and some utilities
  • mmditx.py - contains the core of the MMDiT-X itself
  • folder models with the following files (download separately):
    • clip_l.safetensors (OpenAI CLIP-L, same as SDXL/SD3, can grab a public copy)
    • clip_g.safetensors (openclip bigG, same as SDXL/SD3, can grab a public copy)
    • t5xxl.safetensors (google T5-v1.1-XXL, can grab a public copy)
    • sd3.5_large.safetensors or sd3.5_large_turbo.safetensors or sd3.5_medium.safetensors (or sd3_medium.safetensors)

Code Origin

The code included here originates from:

  • Stability AI internal research code repository (MM-DiT)
  • Public Stability AI repositories (eg VAE)
  • Some unique code for this reference repo written by Alex Goodwin and Vikram Voleti for Stability AI
  • Some code from ComfyUI internal Stability implementation of SD3 (for some code corrections and handlers)
  • HuggingFace and upstream providers (for sections of CLIP/T5 code)

Legal

Check the LICENSE-CODE file.

Note

Some code in other_impls originates from HuggingFace and is subject to the HuggingFace Transformers Apache2 License