FairSeq repo with Apollo optimizer
☆114Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for fairseq-apollo
Users that are interested in fairseq-apollo are comparing it to the libraries listed below
Sorting:
- A PyTorch Implementation of the Luna: Linear Unified Nested Attention☆41Jul 29, 2021Updated 4 years ago
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- Deep neural models for core NLP tasks☆13Nov 9, 2017Updated 8 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization☆182Nov 21, 2021Updated 4 years ago
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆23Dec 30, 2022Updated 3 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆70May 14, 2023Updated 2 years ago
- ☆14Nov 20, 2022Updated 3 years ago
- ☆16Oct 16, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆229Apr 18, 2022Updated 3 years ago
- MeCab model trained with OpenKorPos.☆23Jun 19, 2022Updated 3 years ago
- ☆22Dec 1, 2021Updated 4 years ago
- A utility for storing and reading files for Korean LM training 💾☆35Oct 15, 2025Updated 5 months ago
- PyTorch Implementation of NeurIPS 2020 paper "Learning Sparse Prototypes for Text Generation"☆22Jul 8, 2021Updated 4 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation☆25Oct 2, 2020Updated 5 years ago
- ☆21Sep 17, 2021Updated 4 years ago
- ☆44Sep 16, 2020Updated 5 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- [ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM☆190Jan 27, 2025Updated last year
- Long Range Arena for Benchmarking Efficient Transformers☆786Dec 16, 2023Updated 2 years ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆475Mar 7, 2024Updated 2 years ago
- ☆14Apr 12, 2023Updated 2 years ago
- Invertible Generative Flows☆85Apr 1, 2021Updated 4 years ago
- Code Repository for "Please Mind the Root: Decoding Arborescences for Dependency Parsing" and "On Finding the K-best Non-projective Depen…☆20Dec 12, 2022Updated 3 years ago
- FaVIQ: Fact Verification from Information-seeking Questions☆43Nov 23, 2022Updated 3 years ago
- Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets☆130Nov 12, 2022Updated 3 years ago
- Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.☆19Jan 21, 2021Updated 5 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Implementation of QKVAE☆11Feb 24, 2023Updated 3 years ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 6 months ago
- Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting☆17Nov 30, 2021Updated 4 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- ☆36Oct 3, 2018Updated 7 years ago
- Official Repository for Efficient Linear-Time Attention Transformers.☆18Jun 2, 2024Updated last year