Luo-Yihong commited on
Commit
d80d4fb
·
verified ·
1 Parent(s): 427b4e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -3
README.md CHANGED
@@ -1,3 +1,163 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ---
5
+ license: apache-2.0
6
+ ---
7
+ # TDM: Learning Few-Step Diffusion Models by Trajectory Distribution Matching
8
+ <div align="center">
9
+ <a href="https://tdm-t2x.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github-Page&color=blue&logo=github-pages"></a> &ensp;
10
+ <a href="https://arxiv.org/abs/2503.06674"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:TDM&color=red&logo=arxiv"></a> &ensp;
11
+ </div>
12
+
13
+ This is the Official Repository of "[Learning Few-Step Diffusion Models by Trajectory Distribution Matching](https://arxiv.org/abs/2503.06674)", by *Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang*.
14
+
15
+
16
+ ## User Study Time!
17
+ ![user_study](user_study.jpg)
18
+ Which one do you think is better? Some images are generated by Pixart-α (50 NFE). Some images are generated by **TDM (4 NFE)**, distilling from Pixart-α in a data-free way with merely 500 training iterations and 2 A800 hours.
19
+
20
+ <details>
21
+
22
+ <summary style="color: #1E88E5; cursor: pointer; font-size: 1.2em;"> Click for answer</summary>
23
+
24
+ <p style="font-size: 1.2em; margin-top: 8px;">Answers of TDM's position (left to right): bottom, bottom, top, bottom, top.</p>
25
+
26
+ </details>
27
+
28
+ ## Fast Text-to-Video Geneartion
29
+
30
+ Our proposed TDM can be easily extended to text-to-video.
31
+
32
+ <p align="center">
33
+ <img src="teacher.gif" alt="Teacher" width="100%">
34
+ <img src="student.gif" alt="Student" width="100%">
35
+ </p>
36
+
37
+ The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown in the below, achieving an impressive **25 times speedup without performance degradation**. (Note: The noise in the GIF is due to compression.)
38
+
39
+ ## Usage
40
+ ### TDM-SD3-LoRA
41
+ ```python
42
+ import torch
43
+ from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
44
+ from huggingface_hub import hf_hub_download
45
+ from safetensors.torch import load_file
46
+ from diffusers.utils import make_image_grid
47
+ pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
48
+ pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
49
+ pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
50
+ pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
51
+ pipe.vae.config.shift_factor = 0.0
52
+ pipe = pipe.to("cuda")
53
+ pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
54
+ pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
55
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
56
+ generator = torch.manual_seed(8888)
57
+ image = pipe(
58
+ prompt="A cute panda holding a sign says TDM SOTA!",
59
+ negative_prompt="",
60
+ num_inference_steps=4,
61
+ height=1024,
62
+ width=1024,
63
+ num_images_per_prompt = 1,
64
+ guidance_scale=1.,
65
+ generator = generator,
66
+ ).images[0]
67
+
68
+ pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
69
+ pipe.set_adapters(["tdm"], [0.]) # Unload lora
70
+ generator = torch.manual_seed(8888)
71
+ teacher_image = pipe(
72
+ prompt="A cute panda holding a sign says TDM SOTA!",
73
+ negative_prompt="",
74
+ num_inference_steps=28,
75
+ height=1024,
76
+ width=1024,
77
+ num_images_per_prompt = 1,
78
+ guidance_scale=7.,
79
+ generator = generator,
80
+ ).images[0]
81
+ make_image_grid([image,teacher_image],1,2)
82
+ ```
83
+ ![sd3_compare](sd3_compare.jpg)
84
+ The sample generated by SD3 with 56 NFE is on the right, and the sample generated by **TDM** with 4NFE is on the left. Which one do you feel is better?
85
+
86
+ ### TDM-Dreamshaper-v7-LoRA
87
+ ```python
88
+ import torch
89
+ from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
90
+ from huggingface_hub import hf_hub_download
91
+ from safetensors.torch import load_file
92
+ repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
93
+ ckpt_name = "tdm_dreamshaper.pt"
94
+ pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
95
+ pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
96
+ pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
97
+ generator = torch.manual_seed(317)
98
+ image = pipe(
99
+ prompt="A close-up photo of an Asian lady with sunglasses",
100
+ negative_prompt="",
101
+ num_inference_steps=4,
102
+ num_images_per_prompt = 1,
103
+ generator = generator,
104
+ guidance_scale=1.,
105
+ ).images[0]
106
+ image
107
+ ```
108
+ ![tdm_dreamshaper](tdm_dreamshaper.jpg)
109
+
110
+ ## TDM-CogVideoX-2B-LoRA
111
+ ```python
112
+ import torch
113
+ from diffusers import CogVideoXPipeline
114
+ from diffusers.utils import export_to_video
115
+ pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
116
+ pipe.vae.enable_slicing() # Save memory
117
+ pipe.vae.enable_tiling() # Save memory
118
+ pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
119
+ pipe.to("cuda")
120
+ prompt = (
121
+ "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
122
+ "panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
123
+ "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
124
+ "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
125
+ "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
126
+ "atmosphere of this unique musical performance"
127
+ )
128
+ # We train the generator on timesteps [999, 856, 665, 399].
129
+ # The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
130
+ # But TDM-LoRA still works well under 4 NFE.
131
+ # We will update the TDM-CogVideoX-LoRA soon for better performance!
132
+ generator = torch.manual_seed(8888)
133
+ frames = pipe(prompt, guidance_scale=1,
134
+ num_inference_steps=4,
135
+ num_frames=49,
136
+ generator = generator,
137
+ use_dynamic_cfg=True).frames[0]
138
+ export_to_video(frames, "output-TDM.mp4", fps=8)
139
+ ```
140
+ ## 🔥 Pre-trained Models
141
+ We release a bucket of TDM-LoRA. Please enjoy it!
142
+ - [TDM-SD3-LoRA](https://huggingface.co/Luo-Yihong/TDM_sd3_lora)
143
+ - [TDM-CogVideoX-2B-LoRA](https://huggingface.co/Luo-Yihong/TDM_CogVideoX-2B_LoRA)
144
+ - [TDM-Dreamshaper-LoRA](https://huggingface.co/Luo-Yihong/TDM_dreamshaper_LoRA)
145
+
146
+
147
+ ## Contact
148
+
149
+ Please contact Yihong Luo ([email protected]) if you have any questions about this work.
150
+
151
+ ## Bibtex
152
+
153
+ ```
154
+ @misc{luo2025tdm,
155
+ title={Learning Few-Step Diffusion Models by Trajectory Distribution Matching},
156
+ author={Yihong Luo and Tianyang Hu and Jiacheng Sun and Yujun Cai and Jing Tang},
157
+ year={2025},
158
+ eprint={2503.06674},
159
+ archivePrefix={arXiv},
160
+ primaryClass={cs.CV},
161
+ url={https://arxiv.org/abs/2503.06674},
162
+ }
163
+ ```