Sequence modeling with Mega.
☆303Jan 28, 2023Updated 3 years ago
Alternatives and similar repositories for mega
Users that are interested in mega are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FairSeq repo with Apollo optimizer☆113Dec 20, 2023Updated 2 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- ☆165Jan 24, 2023Updated 3 years ago
- Structured state space sequence models☆2,875Jul 17, 2024Updated last year
- ☆317Jan 8, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆251Jun 6, 2025Updated 10 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated last year
- Language Modeling with the H3 State Space Model☆522Sep 29, 2023Updated 2 years ago
- ☆53Jan 19, 2023Updated 3 years ago
- Long-context pretrained encoder-decoder models☆96Oct 28, 2022Updated 3 years ago
- ☆31Jul 2, 2023Updated 2 years ago
- Trains Transformer model variants. Data isn't shuffled between batches.☆143Oct 5, 2022Updated 3 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Nov 21, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆788Dec 16, 2023Updated 2 years ago
- Understand and test language model architectures on synthetic tasks.☆264Mar 22, 2026Updated 2 weeks ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆170Jan 30, 2025Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆102Feb 25, 2023Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆57Dec 4, 2024Updated last year
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆228Apr 18, 2022Updated 3 years ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆555Oct 30, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings☆17Oct 29, 2022Updated 3 years ago
- Convolutions for Sequence Modeling☆911Jun 13, 2024Updated last year
- Accelerated First Order Parallel Associative Scan☆197Jan 7, 2026Updated 3 months ago
- Foundation Architecture for (M)LLMs☆3,133Apr 11, 2024Updated last year
- ☆107Mar 9, 2024Updated 2 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".☆12Jan 4, 2024Updated 2 years ago
- ☆260Jun 6, 2025Updated 10 months ago
- ☆23Oct 15, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Implementation of https://srush.github.io/annotated-s4☆515Jun 20, 2025Updated 9 months ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Pytorch library for fast transformer implementations☆1,767Mar 23, 2023Updated 3 years ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,065Mar 7, 2024Updated 2 years ago
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Reference implementation of Megalodon 7B model☆527May 17, 2025Updated 10 months ago
- ☆20May 30, 2024Updated last year