pytorch-tpu / transformers
🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
☆15Updated this week
Alternatives and similar repositories for transformers:
Users that are interested in transformers are comparing it to the libraries listed below
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆14Updated last year
- ☆73Updated last year
- Calculating Expected Time for training LLM.☆38Updated last year
- The official implemetation of "Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks" (NAACL 2022).☆43Updated 2 years ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 3 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- ☆55Updated 2 years ago
- Megatron LM 11B on Huggingface Transformers☆27Updated 3 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆74Updated 3 years ago
- Tools for managing datasets for governance and training.☆82Updated 2 weeks ago
- ☆96Updated 2 years ago
- [NAACL 2021] Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering☆36Updated 3 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 11 months ago
- [TMLR'23] Contrastive Search Is What You Need For Neural Text Generation☆119Updated last year
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆68Updated last year
- DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization (ACL 2022)☆50Updated last year
- ☆77Updated last year
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆97Updated last year
- ☆21Updated 3 years ago
- Dense hybrid representations for text retrieval☆62Updated last year
- ☆97Updated 2 years ago
- PyTorch reimplementation of REALM and ORQA☆22Updated 3 years ago
- Transformers at any scale☆41Updated last year
- Pre-training BART in Flax on The Pile dataset☆20Updated 3 years ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆15Updated last year
- ☆67Updated 2 years ago
- Hugging Face RoBERTa with Flash Attention 2☆21Updated last year
- Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers☆48Updated last year
- Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3☆23Updated 3 years ago