SMOTE: Synthetic Minority Over-sampling Technique Paper β’ 1106.1813 β’ Published Jun 9, 2011 β’ 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Paper β’ 1406.1078 β’ Published Jun 3, 2014
Distributed Representations of Sentences and Documents Paper β’ 1405.4053 β’ Published May 16, 2014
Sequence to Sequence Learning with Neural Networks Paper β’ 1409.3215 β’ Published Sep 10, 2014 β’ 3
Neural Machine Translation by Jointly Learning to Align and Translate Paper β’ 1409.0473 β’ Published Sep 1, 2014 β’ 5
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Paper β’ 1804.07461 β’ Published Apr 20, 2018 β’ 4
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper β’ 1810.04805 β’ Published Oct 11, 2018 β’ 17
RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper β’ 1907.11692 β’ Published Jul 26, 2019 β’ 7
Energy and Policy Considerations for Deep Learning in NLP Paper β’ 1906.02243 β’ Published Jun 5, 2019 β’ 2
XLNet: Generalized Autoregressive Pretraining for Language Understanding Paper β’ 1906.08237 β’ Published Jun 19, 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper β’ 1910.01108 β’ Published Oct 2, 2019 β’ 14
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper β’ 1910.10683 β’ Published Oct 23, 2019 β’ 11
AR-Net: A simple Auto-Regressive Neural Network for time-series Paper β’ 1911.12436 β’ Published Nov 27, 2019 β’ 1
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Paper β’ 2003.10555 β’ Published Mar 23, 2020
SQuAD: 100,000+ Questions for Machine Comprehension of Text Paper β’ 1606.05250 β’ Published Jun 16, 2016 β’ 3
Mish: A Self Regularized Non-Monotonic Activation Function Paper β’ 1908.08681 β’ Published Aug 23, 2019 β’ 1
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Paper β’ 2101.00027 β’ Published Dec 31, 2020 β’ 6
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper β’ 2101.03961 β’ Published Jan 11, 2021 β’ 13
LoRA: Low-Rank Adaptation of Large Language Models Paper β’ 2106.09685 β’ Published Jun 17, 2021 β’ 35
Evaluating Large Language Models Trained on Code Paper β’ 2107.03374 β’ Published Jul 7, 2021 β’ 8
NeuralProphet: Explainable Forecasting at Scale Paper β’ 2111.15397 β’ Published Nov 29, 2021 β’ 1
LLaMA: Open and Efficient Foundation Language Models Paper β’ 2302.13971 β’ Published Feb 27, 2023 β’ 14
PyTorch: An Imperative Style, High-Performance Deep Learning Library Paper β’ 1912.01703 β’ Published Dec 3, 2019 β’ 1
TensorFlow: A system for large-scale machine learning Paper β’ 1605.08695 β’ Published May 27, 2016 β’ 1
Theano: A Python framework for fast computation of mathematical expressions Paper β’ 1605.02688 β’ Published May 9, 2016 β’ 1
Caffe: Convolutional Architecture for Fast Feature Embedding Paper β’ 1408.5093 β’ Published Jun 20, 2014 β’ 1
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents Paper β’ 1901.08149 β’ Published Jan 23, 2019 β’ 3
Annotated History of Modern AI and Deep Learning Paper β’ 2212.11279 β’ Published Dec 21, 2022 β’ 1
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper β’ 2405.01535 β’ Published May 2, 2024 β’ 121
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper β’ 2312.00752 β’ Published Dec 1, 2023 β’ 143
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper β’ 2405.21060 β’ Published May 31, 2024 β’ 66
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper β’ 2406.17557 β’ Published Jun 25, 2024 β’ 93
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Paper β’ 2405.15793 β’ Published May 6, 2024 β’ 5