Sequence modeling with Mega.
☆303Jan 28, 2023Updated 3 years ago
Alternatives and similar repositories for mega
Users that are interested in mega are comparing it to the libraries listed below
Sorting:
- FairSeq repo with Apollo optimizer☆114Dec 20, 2023Updated 2 years ago
- ☆164Jan 24, 2023Updated 3 years ago
- Structured state space sequence models☆2,854Jul 17, 2024Updated last year
- ☆316Jan 8, 2025Updated last year
- ☆52Jan 19, 2023Updated 3 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆67Apr 24, 2024Updated last year
- Language Modeling with the H3 State Space Model☆522Sep 29, 2023Updated 2 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Long-context pretrained encoder-decoder models☆96Oct 28, 2022Updated 3 years ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 8 months ago
- ☆31Jul 2, 2023Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Nov 21, 2022Updated 3 years ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆781Dec 16, 2023Updated 2 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".☆12Jan 4, 2024Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Convolutions for Sequence Modeling☆912Jun 13, 2024Updated last year
- Understand and test language model architectures on synthetic tasks.☆254Updated this week
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆170Jan 30, 2025Updated last year
- Code for the ALiBi method for transformer language models (ICLR 2022)☆552Oct 30, 2023Updated 2 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆228Apr 18, 2022Updated 3 years ago
- Trains Transformer model variants. Data isn't shuffled between batches.☆143Oct 5, 2022Updated 3 years ago
- Foundation Architecture for (M)LLMs☆3,135Apr 11, 2024Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- Accelerated First Order Parallel Associative Scan☆194Jan 7, 2026Updated last month
- [NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings☆17Oct 29, 2022Updated 3 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19May 8, 2025Updated 9 months ago
- ☆259Jun 6, 2025Updated 8 months ago
- Pytorch library for fast transformer implementations☆1,762Mar 23, 2023Updated 2 years ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,065Mar 7, 2024Updated last year
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- ☆20May 30, 2024Updated last year
- Implementation of https://srush.github.io/annotated-s4☆512Jun 20, 2025Updated 8 months ago
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- ☆106Mar 9, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- ☆20Apr 17, 2023Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year