Submitted by akhaliq 186 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection · 6 authors 15
Submitted by akhaliq 63 ShortGPT: Layers in Large Language Models are More Redundant Than You Expect · 8 authors 21
Submitted by akhaliq 40 PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation · 10 authors 1
Submitted by akhaliq 21 Learning to Decode Collaboratively with Multiple Language Models · 5 authors 6
Submitted by akhaliq 16 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL · 12 authors 1
Submitted by akhaliq 14 Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling · 6 authors 1