FairSeq repo with Apollo optimizer
☆114Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for fairseq-apollo
Users that are interested in fairseq-apollo are comparing it to the libraries listed below
Sorting:
- A PyTorch Implementation of the Luna: Linear Unified Nested Attention☆41Jul 29, 2021Updated 4 years ago
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆23Dec 30, 2022Updated 3 years ago
- ☆14Nov 20, 2022Updated 3 years ago
- PyTorch Implementation of NeurIPS 2020 paper "Learning Sparse Prototypes for Text Generation"☆22Jul 8, 2021Updated 4 years ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆14Sep 18, 2025Updated 5 months ago
- MeCab model trained with OpenKorPos.☆23Jun 19, 2022Updated 3 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- ☆16Oct 16, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Implementation of QKVAE☆11Feb 24, 2023Updated 3 years ago
- Deep neural models for core NLP tasks☆13Nov 9, 2017Updated 8 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- A utility for storing and reading files for Korean LM training 💾☆35Oct 15, 2025Updated 4 months ago
- Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization☆182Nov 21, 2021Updated 4 years ago
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- We can crawl NaverBlog, Twitter, Youtube!!☆14Sep 13, 2019Updated 6 years ago
- EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560☆58Feb 28, 2025Updated last year
- FaVIQ: Fact Verification from Information-seeking Questions☆43Nov 23, 2022Updated 3 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆228Apr 18, 2022Updated 3 years ago
- Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting☆17Nov 30, 2021Updated 4 years ago
- Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization☆14Jul 24, 2022Updated 3 years ago
- ☆44Sep 16, 2020Updated 5 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆475Mar 7, 2024Updated last year
- Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets☆130Nov 12, 2022Updated 3 years ago
- EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering☆68Nov 26, 2021Updated 4 years ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆70May 14, 2023Updated 2 years ago
- [ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM☆190Jan 27, 2025Updated last year
- Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)☆29Dec 9, 2020Updated 5 years ago
- Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference☆15Jul 6, 2020Updated 5 years ago
- Code Repository for "Please Mind the Root: Decoding Arborescences for Dependency Parsing" and "On Finding the K-best Non-projective Depen…☆20Dec 12, 2022Updated 3 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆781Dec 16, 2023Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Jul 9, 2020Updated 5 years ago
- 🚀 Implementation of easy-to-use 3D parallelism based on Huggingface Transformers & Microsoft DeepSpeed☆31Feb 5, 2022Updated 4 years ago