facebookresearch/mega

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/mega)

facebookresearch / mega

Sequence modeling with Mega.

☆303

Alternatives and similar repositories for mega

Users that are interested in mega are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

XuezheMax / fairseq-apollo
View on GitHub
FairSeq repo with Apollo optimizer
☆113Dec 20, 2023Updated 2 years ago
lucidrains / Mega-pytorch
View on GitHub
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Aug 26, 2023Updated 2 years ago
ctlllll / SGConv
View on GitHub
☆165Jan 24, 2023Updated 3 years ago
state-spaces / s4
View on GitHub
Structured state space sequence models
☆2,911Jul 17, 2024Updated 2 years ago
lindermanlab / S5
View on GitHub
☆324Jan 8, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
HazyResearch / based
View on GitHub
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆256Jun 6, 2025Updated last year
OpenNLPLab / HGRN
View on GitHub
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆68Apr 24, 2024Updated 2 years ago
HazyResearch / H3
View on GitHub
Language Modeling with the H3 State Space Model
☆525Sep 29, 2023Updated 2 years ago
microsoft / EfficientLongSequenceModeling
View on GitHub
☆54Jan 19, 2023Updated 3 years ago
facebookresearch / bart_ls
View on GitHub
Long-context pretrained encoder-decoder models
☆97Oct 28, 2022Updated 3 years ago
Shark-NLP / CAB
View on GitHub
☆31Jul 2, 2023Updated 3 years ago
facebookresearch / transformer-sequential
View on GitHub
Trains Transformer model variants. Data isn't shuffled between batches.
☆147Oct 5, 2022Updated 3 years ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
machelreid / m2d2
View on GitHub
M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer
☆54Nov 21, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
renll / SeqBoat
View on GitHub
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Dec 2, 2023Updated 2 years ago
google-research / long-range-arena
View on GitHub
Long Range Arena for Benchmarking Efficient Transformers
☆787Dec 16, 2023Updated 2 years ago
HazyResearch / zoology
View on GitHub
Understand and test language model architectures on synthetic tasks.
☆277Mar 22, 2026Updated 3 months ago
goombalab / hydra
View on GitHub
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆174Jan 30, 2025Updated last year
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
lucidrains / gated-state-spaces-pytorch
View on GitHub
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Feb 25, 2023Updated 3 years ago
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
NVIDIA / transformer-ls
View on GitHub
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Apr 18, 2022Updated 4 years ago
ofirpress / attention_with_linear_biases
View on GitHub
Code for the ALiBi method for transformer language models (ICLR 2022)
☆558Oct 30, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
HazyResearch / safari
View on GitHub
Convolutions for Sequence Modeling
☆915Jun 13, 2024Updated 2 years ago
bigganbing / Fairseq_MorphTE
View on GitHub
[NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings
☆17Oct 29, 2022Updated 3 years ago
microsoft / torchscale
View on GitHub
Foundation Architecture for (M)LLMs
☆3,133Apr 11, 2024Updated 2 years ago
proger / accelerated-scan
View on GitHub
Accelerated First Order Parallel Associative Scan
☆198Jan 7, 2026Updated 6 months ago
berlino / gated_linear_attention
View on GitHub
☆107Mar 9, 2024Updated 2 years ago
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
ictnlp / PCFG-NAT
View on GitHub
Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".
☆12Jan 4, 2024Updated 2 years ago
google-research / meliad
View on GitHub
☆261Jun 6, 2025Updated last year
violet-zct / swarm-distillation-zero-shot
View on GitHub
☆23Oct 15, 2022Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
srush / annotated-s4
View on GitHub
Implementation of https://srush.github.io/annotated-s4
☆519Jun 20, 2025Updated last year
da03 / criticize_text_generation
View on GitHub
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆12Mar 18, 2023Updated 3 years ago
abertsch72 / unlimiformer
View on GitHub
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,062Mar 7, 2024Updated 2 years ago
idiap / fast-transformers
View on GitHub
Pytorch library for fast transformer implementations
☆1,773Mar 23, 2023Updated 3 years ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated 2 years ago
amirzandieh / HyperAttention
View on GitHub
Triton Implementation of HyperAttention Algorithm
☆48Dec 11, 2023Updated 2 years ago
XuezheMax / megalodon
View on GitHub
Reference implementation of Megalodon 7B model
☆526May 17, 2025Updated last year