Submitted by akhaliq 67 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing · 7 authors 5
Submitted by yulunliu 52 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing · 6 authors 2
Submitted by myownskyW7 41 MotionClone: Training-Free Motion Cloning for Controllable Video Generation · 9 authors 4
Submitted by yixinsong 38 PowerInfer-2: Fast Large Language Model Inference on a Smartphone · 6 authors 5
Submitted by Liuff23 38 Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion · 6 authors 4
Submitted by lixin4ever 36 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs · 11 authors 2
Submitted by jedyang97 30 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination · 7 authors 2
Submitted by akhaliq 28 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos · 14 authors
Submitted by yixinsong 27 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters · 7 authors 2
Submitted by GlyphByT5 21 FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation · 8 authors
Submitted by chrlu 17 Discovering Preference Optimization Algorithms with and for Large Language Models · 7 authors
Submitted by akhaliq 17 AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation · 5 authors
Submitted by akhaliq 16 Hierarchical Patch Diffusion Models for High-Resolution Video Generation · 4 authors
Submitted by yifanzhang114 14 Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models · 7 authors 2
Submitted by chrisliu298 10 Large Language Model Unlearning via Embedding-Corrupted Prompts · 4 authors
Submitted by AliBehrouz 10 Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models · 3 authors 1