lucidrains / MEGABYTE-pytorchView external linksLinks
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆655Dec 27, 2024Updated last year
Alternatives and similar repositories for MEGABYTE-pytorch
Users that are interested in MEGABYTE-pytorch are comparing it to the libraries listed below
Sorting:
- Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT☆224Aug 20, 2024Updated last year
- RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable)…☆14,351Updated this week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,316Mar 6, 2025Updated 11 months ago
- Convolutions for Sequence Modeling☆911Jun 13, 2024Updated last year
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,148Jan 11, 2024Updated 2 years ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,066Mar 7, 2024Updated last year
- The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”☆981Jan 30, 2024Updated 2 years ago
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,184Nov 27, 2024Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆334Jun 6, 2024Updated last year
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆125Feb 6, 2026Updated last week
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆123Oct 17, 2024Updated last year
- Fast and memory-efficient exact attention☆22,231Updated this week
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 8 months ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,837Jun 10, 2024Updated last year
- Foundation Architecture for (M)LLMs☆3,133Apr 11, 2024Updated last year
- Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch☆2,619Jan 12, 2025Updated last year
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- ImageBind One Embedding Space to Bind Them All☆8,971Nov 21, 2025Updated 2 months ago
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- Mamba SSM architecture☆17,186Jan 12, 2026Updated last month
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆67Apr 24, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,188Jul 11, 2024Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆562Dec 28, 2024Updated last year
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Aug 25, 2023Updated 2 years ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆341Feb 23, 2025Updated 11 months ago
- Official implementation of TransNormerLLM: A Faster and Better LLM☆250Jan 23, 2024Updated 2 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,924Dec 7, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- An Open-source Streaming High-fidelity Neural Audio Codec☆498Mar 4, 2025Updated 11 months ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,706Jun 25, 2024Updated last year
- Vector (and Scalar) Quantization, in Pytorch☆3,870Updated this week
- The repository for the code of the UltraFastBERT paper☆519Mar 24, 2024Updated last year
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated 2 weeks ago
- Muon is Scalable for LLM Training☆1,432Aug 3, 2025Updated 6 months ago
- Minimalistic large language model 3D-parallelism training☆2,559Updated this week