HF中国镜像站

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1910.01108

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

Running on CPU Upgrade

2.03k

2.03k

Anychat

🏢

Display code snippets for different chat providers
Running

267

267

Qwen2.5 Coder Artifacts

🐢

Generate application code with Qwen2.5-Coder-32B
Running

906

906

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

12.7k

12.7k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

chat-models-candidates

nikravan/glm-4vq

Document Question Answering • Updated Jun 16, 2024 • 659 • 32
deepseek-ai/deepseek-coder-33b-instruct

Text Generation • Updated Mar 7, 2024 • 33.1k • 500
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 60
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 13

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14
distilbert/distilbert-base-uncased-finetuned-sst-2-english

Text Classification • Updated Dec 19, 2023 • 6.69M • • 707
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25, 2024 • 20
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Paper • 2401.04092 • Published Jan 8, 2024 • 21

machine learning and neural network papers 📜

about 4 hours ago

SMOTE: Synthetic Minority Over-sampling Technique

Paper • 1106.1813 • Published Jun 9, 2011 • 1
Scikit-learn: Machine Learning in Python

Paper • 1201.0490 • Published Jan 2, 2012 • 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Paper • 1406.1078 • Published Jun 3, 2014
Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014

language-models

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 46
BloombergGPT: A Large Language Model for Finance

Paper • 2303.17564 • Published Mar 30, 2023 • 22
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 53
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 46
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 35
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 10

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs