Sequence modeling with Mega.
☆303Jan 28, 2023Updated 3 years ago
Alternatives and similar repositories for mega
Users that are interested in mega are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FairSeq repo with Apollo optimizer☆113Dec 20, 2023Updated 2 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- ☆165Jan 24, 2023Updated 3 years ago
- Structured state space sequence models☆2,890Jul 17, 2024Updated last year
- ☆318Jan 8, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆252Jun 6, 2025Updated 10 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated 2 years ago
- Language Modeling with the H3 State Space Model☆523Sep 29, 2023Updated 2 years ago
- ☆53Jan 19, 2023Updated 3 years ago
- Long-context pretrained encoder-decoder models☆96Oct 28, 2022Updated 3 years ago
- ☆31Jul 2, 2023Updated 2 years ago
- Trains Transformer model variants. Data isn't shuffled between batches.☆145Oct 5, 2022Updated 3 years ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Nov 21, 2022Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆787Dec 16, 2023Updated 2 years ago
- Understand and test language model architectures on synthetic tasks.☆265Mar 22, 2026Updated last month
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆171Jan 30, 2025Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆102Feb 25, 2023Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆57Dec 4, 2024Updated last year
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆228Apr 18, 2022Updated 4 years ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆555Oct 30, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings☆17Oct 29, 2022Updated 3 years ago
- Convolutions for Sequence Modeling☆911Jun 13, 2024Updated last year
- Accelerated First Order Parallel Associative Scan☆197Jan 7, 2026Updated 3 months ago
- Foundation Architecture for (M)LLMs☆3,136Apr 11, 2024Updated 2 years ago
- ☆107Mar 9, 2024Updated 2 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".☆12Jan 4, 2024Updated 2 years ago
- ☆260Jun 6, 2025Updated 10 months ago
- ☆23Oct 15, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Implementation of https://srush.github.io/annotated-s4☆515Jun 20, 2025Updated 10 months ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Pytorch library for fast transformer implementations☆1,769Mar 23, 2023Updated 3 years ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,064Mar 7, 2024Updated 2 years ago
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Reference implementation of Megalodon 7B model☆526May 17, 2025Updated 11 months ago
- ☆20May 30, 2024Updated last year