FairSeq repo with Apollo optimizer
☆113Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for fairseq-apollo
Users that are interested in fairseq-apollo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A PyTorch Implementation of the Luna: Linear Unified Nested Attention☆41Jul 29, 2021Updated 4 years ago
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- Deep neural models for core NLP tasks☆13Nov 9, 2017Updated 8 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization☆182Nov 21, 2021Updated 4 years ago
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆23Dec 30, 2022Updated 3 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆70May 14, 2023Updated 2 years ago
- ☆14Nov 20, 2022Updated 3 years ago
- ☆16Oct 16, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆228Apr 18, 2022Updated 4 years ago
- MeCab model trained with OpenKorPos.☆23Jun 19, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆22Dec 1, 2021Updated 4 years ago
- A utility for storing and reading files for Korean LM training 💾☆35Oct 15, 2025Updated 6 months ago
- PyTorch Implementation of NeurIPS 2020 paper "Learning Sparse Prototypes for Text Generation"☆22Jul 8, 2021Updated 4 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation☆25Oct 2, 2020Updated 5 years ago
- ☆20Sep 17, 2021Updated 4 years ago
- ☆43Sep 16, 2020Updated 5 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Apr 6, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆13Feb 7, 2023Updated 3 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆120Aug 4, 2021Updated 4 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆787Dec 16, 2023Updated 2 years ago
- [ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM☆190Jan 27, 2025Updated last year
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆476Mar 7, 2024Updated 2 years ago
- ☆14Apr 12, 2023Updated 3 years ago
- Invertible Generative Flows☆85Apr 1, 2021Updated 5 years ago
- FaVIQ: Fact Verification from Information-seeking Questions☆43Nov 23, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code Repository for "Please Mind the Root: Decoding Arborescences for Dependency Parsing" and "On Finding the K-best Non-projective Depen…☆20Dec 12, 2022Updated 3 years ago
- Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets☆130Nov 12, 2022Updated 3 years ago
- Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.☆19Jan 21, 2021Updated 5 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Implementation of QKVAE☆11Feb 24, 2023Updated 3 years ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 7 months ago
- Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting☆17Nov 30, 2021Updated 4 years ago