Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,163 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
---
|
5 |
+
license: apache-2.0
|
6 |
+
---
|
7 |
+
# TDM: Learning Few-Step Diffusion Models by Trajectory Distribution Matching
|
8 |
+
<div align="center">
|
9 |
+
<a href="https://tdm-t2x.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github-Page&color=blue&logo=github-pages"></a>  
|
10 |
+
<a href="https://arxiv.org/abs/2503.06674"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:TDM&color=red&logo=arxiv"></a>  
|
11 |
+
</div>
|
12 |
+
|
13 |
+
This is the Official Repository of "[Learning Few-Step Diffusion Models by Trajectory Distribution Matching](https://arxiv.org/abs/2503.06674)", by *Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang*.
|
14 |
+
|
15 |
+
|
16 |
+
## User Study Time!
|
17 |
+

|
18 |
+
Which one do you think is better? Some images are generated by Pixart-α (50 NFE). Some images are generated by **TDM (4 NFE)**, distilling from Pixart-α in a data-free way with merely 500 training iterations and 2 A800 hours.
|
19 |
+
|
20 |
+
<details>
|
21 |
+
|
22 |
+
<summary style="color: #1E88E5; cursor: pointer; font-size: 1.2em;"> Click for answer</summary>
|
23 |
+
|
24 |
+
<p style="font-size: 1.2em; margin-top: 8px;">Answers of TDM's position (left to right): bottom, bottom, top, bottom, top.</p>
|
25 |
+
|
26 |
+
</details>
|
27 |
+
|
28 |
+
## Fast Text-to-Video Geneartion
|
29 |
+
|
30 |
+
Our proposed TDM can be easily extended to text-to-video.
|
31 |
+
|
32 |
+
<p align="center">
|
33 |
+
<img src="teacher.gif" alt="Teacher" width="100%">
|
34 |
+
<img src="student.gif" alt="Student" width="100%">
|
35 |
+
</p>
|
36 |
+
|
37 |
+
The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown in the below, achieving an impressive **25 times speedup without performance degradation**. (Note: The noise in the GIF is due to compression.)
|
38 |
+
|
39 |
+
## Usage
|
40 |
+
### TDM-SD3-LoRA
|
41 |
+
```python
|
42 |
+
import torch
|
43 |
+
from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
|
44 |
+
from huggingface_hub import hf_hub_download
|
45 |
+
from safetensors.torch import load_file
|
46 |
+
from diffusers.utils import make_image_grid
|
47 |
+
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
|
48 |
+
pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
|
49 |
+
pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
|
50 |
+
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
|
51 |
+
pipe.vae.config.shift_factor = 0.0
|
52 |
+
pipe = pipe.to("cuda")
|
53 |
+
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
|
54 |
+
pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
|
55 |
+
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
|
56 |
+
generator = torch.manual_seed(8888)
|
57 |
+
image = pipe(
|
58 |
+
prompt="A cute panda holding a sign says TDM SOTA!",
|
59 |
+
negative_prompt="",
|
60 |
+
num_inference_steps=4,
|
61 |
+
height=1024,
|
62 |
+
width=1024,
|
63 |
+
num_images_per_prompt = 1,
|
64 |
+
guidance_scale=1.,
|
65 |
+
generator = generator,
|
66 |
+
).images[0]
|
67 |
+
|
68 |
+
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
|
69 |
+
pipe.set_adapters(["tdm"], [0.]) # Unload lora
|
70 |
+
generator = torch.manual_seed(8888)
|
71 |
+
teacher_image = pipe(
|
72 |
+
prompt="A cute panda holding a sign says TDM SOTA!",
|
73 |
+
negative_prompt="",
|
74 |
+
num_inference_steps=28,
|
75 |
+
height=1024,
|
76 |
+
width=1024,
|
77 |
+
num_images_per_prompt = 1,
|
78 |
+
guidance_scale=7.,
|
79 |
+
generator = generator,
|
80 |
+
).images[0]
|
81 |
+
make_image_grid([image,teacher_image],1,2)
|
82 |
+
```
|
83 |
+

|
84 |
+
The sample generated by SD3 with 56 NFE is on the right, and the sample generated by **TDM** with 4NFE is on the left. Which one do you feel is better?
|
85 |
+
|
86 |
+
### TDM-Dreamshaper-v7-LoRA
|
87 |
+
```python
|
88 |
+
import torch
|
89 |
+
from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
|
90 |
+
from huggingface_hub import hf_hub_download
|
91 |
+
from safetensors.torch import load_file
|
92 |
+
repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
|
93 |
+
ckpt_name = "tdm_dreamshaper.pt"
|
94 |
+
pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
|
95 |
+
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
|
96 |
+
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
|
97 |
+
generator = torch.manual_seed(317)
|
98 |
+
image = pipe(
|
99 |
+
prompt="A close-up photo of an Asian lady with sunglasses",
|
100 |
+
negative_prompt="",
|
101 |
+
num_inference_steps=4,
|
102 |
+
num_images_per_prompt = 1,
|
103 |
+
generator = generator,
|
104 |
+
guidance_scale=1.,
|
105 |
+
).images[0]
|
106 |
+
image
|
107 |
+
```
|
108 |
+

|
109 |
+
|
110 |
+
## TDM-CogVideoX-2B-LoRA
|
111 |
+
```python
|
112 |
+
import torch
|
113 |
+
from diffusers import CogVideoXPipeline
|
114 |
+
from diffusers.utils import export_to_video
|
115 |
+
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
|
116 |
+
pipe.vae.enable_slicing() # Save memory
|
117 |
+
pipe.vae.enable_tiling() # Save memory
|
118 |
+
pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
|
119 |
+
pipe.to("cuda")
|
120 |
+
prompt = (
|
121 |
+
"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
|
122 |
+
"panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
|
123 |
+
"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
|
124 |
+
"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
|
125 |
+
"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
|
126 |
+
"atmosphere of this unique musical performance"
|
127 |
+
)
|
128 |
+
# We train the generator on timesteps [999, 856, 665, 399].
|
129 |
+
# The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
|
130 |
+
# But TDM-LoRA still works well under 4 NFE.
|
131 |
+
# We will update the TDM-CogVideoX-LoRA soon for better performance!
|
132 |
+
generator = torch.manual_seed(8888)
|
133 |
+
frames = pipe(prompt, guidance_scale=1,
|
134 |
+
num_inference_steps=4,
|
135 |
+
num_frames=49,
|
136 |
+
generator = generator,
|
137 |
+
use_dynamic_cfg=True).frames[0]
|
138 |
+
export_to_video(frames, "output-TDM.mp4", fps=8)
|
139 |
+
```
|
140 |
+
## 🔥 Pre-trained Models
|
141 |
+
We release a bucket of TDM-LoRA. Please enjoy it!
|
142 |
+
- [TDM-SD3-LoRA](https://huggingface.co/Luo-Yihong/TDM_sd3_lora)
|
143 |
+
- [TDM-CogVideoX-2B-LoRA](https://huggingface.co/Luo-Yihong/TDM_CogVideoX-2B_LoRA)
|
144 |
+
- [TDM-Dreamshaper-LoRA](https://huggingface.co/Luo-Yihong/TDM_dreamshaper_LoRA)
|
145 |
+
|
146 |
+
|
147 |
+
## Contact
|
148 |
+
|
149 |
+
Please contact Yihong Luo ([email protected]) if you have any questions about this work.
|
150 |
+
|
151 |
+
## Bibtex
|
152 |
+
|
153 |
+
```
|
154 |
+
@misc{luo2025tdm,
|
155 |
+
title={Learning Few-Step Diffusion Models by Trajectory Distribution Matching},
|
156 |
+
author={Yihong Luo and Tianyang Hu and Jiacheng Sun and Yujun Cai and Jing Tang},
|
157 |
+
year={2025},
|
158 |
+
eprint={2503.06674},
|
159 |
+
archivePrefix={arXiv},
|
160 |
+
primaryClass={cs.CV},
|
161 |
+
url={https://arxiv.org/abs/2503.06674},
|
162 |
+
}
|
163 |
+
```
|