SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published 4 days ago • 63
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices Paper • 2403.10425 • Published Mar 15, 2024 • 4
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting Paper • 2403.09981 • Published Mar 15, 2024 • 8
FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model Paper • 2403.10242 • Published Mar 15, 2024 • 12
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba Paper • 2403.09977 • Published Mar 15, 2024 • 11
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding Paper • 2403.10395 • Published Mar 15, 2024 • 9
Recurrent Drafter for Fast Speculative Decoding in Large Language Models Paper • 2403.09919 • Published Mar 14, 2024 • 22
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations Paper • 2403.09704 • Published Mar 8, 2024 • 32
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 70
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15, 2024 • 35
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer Paper • 2403.10301 • Published Mar 15, 2024 • 53
LocalMamba: Visual State Space Model with Windowed Selective Scan Paper • 2403.09338 • Published Mar 14, 2024 • 9
Veagle: Advancements in Multimodal Representation Learning Paper • 2403.08773 • Published Jan 18, 2024 • 10
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14, 2024 • 15
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14, 2024 • 10
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 10
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14, 2024 • 22
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14, 2024 • 18
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Paper • 2403.09333 • Published Mar 14, 2024 • 16