HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆536Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for m2
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆624Updated last month
- Transformers with Arbitrarily Large Context☆637Updated 2 months ago
- Annotated version of the Mamba paper☆455Updated 8 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆212Updated 2 months ago
- A repository for research on medium sized language models.☆479Updated this week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆474Updated 2 weeks ago
- ☆175Updated this week
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆368Updated 4 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆279Updated 6 months ago
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆291Updated 4 months ago
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆393Updated 8 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆801Updated 2 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆642Updated last month
- Code repository for the paper - "Matryoshka Representation Learning"☆423Updated 8 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆553Updated 8 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆610Updated 5 months ago
- Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…☆226Updated 7 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆175Updated 3 months ago
- ☆445Updated last week
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆667Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆369Updated 3 weeks ago
- Minimalistic large language model 3D-parallelism training☆1,227Updated this week
- Some preliminary explorations of Mamba's context scaling.☆190Updated 9 months ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆1,320Updated this week
- ☆411Updated last year
- Official PyTorch implementation of QA-LoRA☆116Updated 7 months ago
- Reference implementation of Megalodon 7B model☆504Updated 6 months ago
- A bagel, with everything.☆312Updated 6 months ago
- Helpful tools and examples for working with flex-attention☆460Updated 2 weeks ago