Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 55
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7, 2024 • 70
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5, 2024 • 25
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published Oct 9, 2024 • 42
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8, 2024 • 19
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 33
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4, 2024 • 25
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions Paper • 2411.02394 • Published Nov 4, 2024 • 17
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 78
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14, 2024 • 17
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published Oct 17, 2024 • 25
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published Nov 7, 2024 • 18
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Paper • 2411.07232 • Published Nov 11, 2024 • 65
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11, 2024 • 47
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 68
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published Nov 16, 2024 • 22
Stylecodes: Encoding Stylistic Information For Image Generation Paper • 2411.12811 • Published Nov 19, 2024 • 12
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9
Style-Friendly SNR Sampler for Style-Driven Generation Paper • 2411.14793 • Published Nov 22, 2024 • 36
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22, 2024 • 55
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published Dec 2, 2024 • 13
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 109
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published Dec 2, 2024 • 19
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance Paper • 2412.05355 • Published Dec 6, 2024 • 9
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published Dec 5, 2024 • 18
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 47
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Paper • 2412.07674 • Published Dec 10, 2024 • 20
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published Dec 10, 2024 • 28
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Paper • 2412.05148 • Published Dec 6, 2024 • 11
ObjCtrl-2.5D: Training-free Object Control with Camera Poses Paper • 2412.07721 • Published Dec 10, 2024 • 8
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published Dec 10, 2024 • 19
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models Paper • 2412.08629 • Published Dec 11, 2024 • 12
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Paper • 2412.09349 • Published Dec 12, 2024 • 8
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published Dec 11, 2024 • 45
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published Dec 12, 2024 • 21
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Paper • 2412.09622 • Published Dec 12, 2024 • 8
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 26
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition Paper • 2412.19712 • Published Dec 27, 2024 • 15
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper • 2412.19645 • Published Dec 27, 2024 • 13
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published Jan 2 • 51
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration Paper • 2501.01320 • Published Jan 2 • 11
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published Jan 8 • 14
Diffusion Adversarial Post-Training for One-Step Video Generation Paper • 2501.08316 • Published Jan 14 • 33
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published Jan 9 • 34
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation Paper • 2501.09503 • Published Jan 16 • 13
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published Jan 16 • 70
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 17
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 31
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models Paper • 2502.03032 • Published Feb 5 • 58
Generating Multi-Image Synthetic Data for Text-to-Image Customization Paper • 2502.01720 • Published Feb 3 • 8
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published about 1 month ago • 34
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 189
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published 30 days ago • 30
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published 21 days ago • 38
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published 16 days ago • 69
KV-Edit: Training-Free Image Editing for Precise Background Preservation Paper • 2502.17363 • Published 17 days ago • 33
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs Paper • 2502.18461 • Published 16 days ago • 15
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization Paper • 2503.06698 • Published 4 days ago • 2
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper • 2503.07027 • Published 3 days ago • 21
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published 2 days ago • 52