HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆547Updated last month
Alternatives and similar repositories for m2:
Users that are interested in m2 are comparing it to the libraries listed below
- Annotated version of the Mamba paper☆473Updated 11 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆841Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆297Updated 2 months ago
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆514Updated last week
- Helpful tools and examples for working with flex-attention☆647Updated this week
- Large Context Attention☆684Updated 3 weeks ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆502Updated 3 months ago
- A repository for research on medium sized language models.☆491Updated last month
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆687Updated 10 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆221Updated this week
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆255Updated last year
- Official PyTorch implementation of QA-LoRA☆126Updated 11 months ago
- ☆181Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆215Updated 3 weeks ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆380Updated 7 months ago
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆405Updated last month
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆398Updated 10 months ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆637Updated last month
- Multipack distributed sampler for fast padding-free training of LLMs☆184Updated 6 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆700Updated 4 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆306Updated 8 months ago
- Understand and test language model architectures on synthetic tasks.☆181Updated last month
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆286Updated 9 months ago
- Implementation of DoRA☆290Updated 8 months ago
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,083Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆266Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆319Updated 8 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆585Updated 11 months ago
- Some preliminary explorations of Mamba's context scaling.☆213Updated last year
- ☆502Updated 5 months ago