The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scriptsand conversion utilities for the following models:
ALBERT (from Google Research and the Toyota Technological Institute at Chicago) releasedwith the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, PiyushSharma, Radu Soricut.
BART (from Facebook) released with the paper BART: Denoising Sequence-to-SequencePre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, AbdelrahmanMohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
BARThez (from École polytechnique) released with the paper BARThez: a Skilled PretrainedFrench Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P.Tixier, Michalis Vazirgiannis.
BERT (from Google) released with the paper BERT: Pre-training of Deep BidirectionalTransformers for Language Understanding by Jacob Devlin, Ming-Wei Chang,Kenton Lee and Kristina Toutanova.
BERT For Sequence Generation (from Google) released with the paper LeveragingPre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, ShashiNarayan, Aliaksei Severyn.
Blenderbot (from Facebook) released with the paper Recipes for building anopen-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, MaryWilliamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
BlenderbotSmall (from Facebook) released with the paper Recipes for building anopen-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, MaryWilliamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
BORT (from Alexa) released with the paper Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a TastyFrench Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier OrtizSuárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT withSpan-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou,Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer LanguageModel for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*,Lav R. Varshney, Caiming Xiong and Richard Socher.
DeBERTa (from Microsoft Research) released with the paper DeBERTa: Decoding-enhancedBERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao,Weizhu Chen.
DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-ScaleGenerative Pre-training for Conversational Response Generation by YizheZhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
DistilBERT (from HuggingFace), released together with the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter by VictorSanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT intoDistilmBERT and a Germanversion of DistilBERT.
DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-DomainQuestion Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, PatrickLewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA:Pre-training text encoders as discriminators rather than generators by KevinClark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language ModelPre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer:Filtering out Sequential Redundancy for Efficient Language Processing byZihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
GPT (from OpenAI) released with the paper Improving Language Understanding by GenerativePre-Training by Alec Radford, Karthik Narasimhan, Tim Salimansand Ilya Sutskever.
GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised MultitaskLearners by Alec Radford*, Jeffrey Wu*, Rewon Child, DavidLuan, Dario Amodei** and Ilya Sutskever**.
LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-trainingof Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li,Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
LED (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Longformer (from AllenAI) released with the paper Longformer: The Long-DocumentTransformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-ModalityEncoder Representations from Transformers for Open-Domain Question Answeringby Hao Tan and Mohit Bansal.
MarianMT Machine translation models trained using OPUS data byJörg Tiedemann. The Marian Framework is being developed by the MicrosoftTranslator Team.
MBart (from Facebook) released with the paper Multilingual Denoising Pre-training forNeural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
MPNet (from Microsoft Research) released with the paper MPNet: Masked and PermutedPre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin,Jianfeng Lu, Tie-Yan Liu.
MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trainedtext-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, MihirKale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
Pegasus (from Google) released with the paper PEGASUS: Pre-training with ExtractedGap-sentences for Abstractive Summarization> by Jingqing Zhang, Yao Zhao,Mohammad Saleh and Peter J. Liu.
ProphetNet (from Microsoft Research) released with the paper ProphetNet: PredictingFuture N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi,Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Reformer (from Google Research) released with the paper Reformer: The EfficientTransformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERTPretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, MandarJoshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
SqueezeBert released with the paper SqueezeBERT: What can computer vision teach NLPabout efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, RaviKrishna, and Kurt W. Keutzer.
T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with aUnified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and AdamRoberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing viaPre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,Francesco Piccinno and Julian Martin Eisenschlos.
Transformer-XL (from Google/CMU) released with the paper Transformer-XL:Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*,Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework forSelf-Supervised Learning of Speech Representations by Alexei Baevski, HenryZhou, Abdelrahman Mohamed, Michael Auli.
XLM (from Facebook) released together with the paper Cross-lingual Language ModelPretraining by Guillaume Lample and Alexis Conneau.
XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet:Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan,Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
XLM-RoBERTa (from Facebook AI), released together with the paper UnsupervisedCross-lingual Representation Learning at Scale by Alexis Conneau*, KartikayKhandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, LukeZettlemoyer and Veselin Stoyanov.
XLNet (from Google/CMU) released with the paper XLNet: Generalized AutoregressivePretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, YimingYang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
The table below represents the current support in the library for each of those models, whether they have a Pythontokenizer (called “slow”). A “fast” tokenizer backed by the 🤗 Tokenizers library, whether they have support in PyTorch,TensorFlow and/or Flax.