Mei dianwen

mdw123

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

updated a collection about 1 month ago

Papers

updated a collection about 1 month ago

Papers

View all activity

Organizations

None yet

mdw123's activity

upvoted an article 3 days ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

•

Feb 4

• 3

upvoted a paper about 1 month ago

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11 • 36

upvoted a paper 8 months ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 161

upvoted 10 papers 11 months ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 107

OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9, 2024 • 75

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Paper • 2404.04167 • Published Apr 5, 2024 • 14

Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 31

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 69

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 108

upvoted 7 papers 12 months ago

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 45

Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37

Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 25

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 54

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 47

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21, 2024 • 52

Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 33