Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Aug 26, 2023Updated 2 years ago
Alternatives and similar repositories for Mega-pytorch
Users that are interested in Mega-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Sequence modeling with Mega.☆303Jan 28, 2023Updated 3 years ago
- FairSeq repo with Apollo optimizer☆114Dec 20, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30May 31, 2022Updated 3 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- JAX implementation ViT-VQGAN☆82Sep 21, 2022Updated 3 years ago
- Experiments on the impact of depth in transformers and SSMs.☆41Oct 23, 2025Updated 5 months ago
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆67Jan 10, 2023Updated 3 years ago
- Language Modeling with the H3 State Space Model☆522Sep 29, 2023Updated 2 years ago
- Structured state space sequence models☆2,869Jul 17, 2024Updated last year
- ☆29Jul 9, 2024Updated last year
- Implementation of a Transformer, but completely in Triton☆279Apr 5, 2022Updated 3 years ago
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆46Mar 3, 2021Updated 5 years ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 5 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- ☆107Mar 9, 2024Updated 2 years ago
- Optimized library for large-scale extraction of frames and audio from video.☆201Sep 11, 2023Updated 2 years ago
- Code for "Possibility Before Utility: Learning And Using Hierarchical Affordances" (ICLR 2022)☆14Mar 14, 2022Updated 4 years ago
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,183Nov 27, 2024Updated last year
- Docker containers of baseline agents for the Crafter environment☆30Dec 14, 2021Updated 4 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆786Dec 16, 2023Updated 2 years ago
- ☆164Jan 24, 2023Updated 3 years ago
- Exploring Binary Classification Loss for Speaker Verification☆18Jul 18, 2023Updated 2 years ago
- Latent Diffusion Language Models☆70Sep 20, 2023Updated 2 years ago
- Implementation of a U-net complete with efficient attention as well as the latest research findings☆292May 3, 2024Updated last year
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…☆137Aug 2, 2023Updated 2 years ago
- [EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674☆195Jun 14, 2023Updated 2 years ago
- Sequence alignement methods with helpers for PyTorch.☆24Nov 30, 2022Updated 3 years ago
- Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)☆871Jun 6, 2023Updated 2 years ago
- ☆317Jan 8, 2025Updated last year
- ☆389Oct 18, 2023Updated 2 years ago
- ☆20May 30, 2024Updated last year