HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
β548Updated 2 months ago
Alternatives and similar repositories for m2:
Users that are interested in m2 are comparing it to the libraries listed below
- Annotated version of the Mamba paperβ475Updated last year
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ506Updated 4 months ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorchβ638Updated 2 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionβ¦β287Updated 10 months ago
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorchβ407Updated 2 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ691Updated 11 months ago
- β182Updated this week
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"β855Updated last month
- Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,β¦β226Updated last year
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ255Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β223Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β223Updated last month
- Understand and test language model architectures on synthetic tasks.β185Updated 2 weeks ago
- Official PyTorch implementation of QA-LoRAβ129Updated last year
- Recurrent Memory Transformerβ149Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reductionβ382Updated 8 months ago
- Some preliminary explorations of Mamba's context scaling.β212Updated last year
- The repository for the code of the UltraFastBERT paperβ517Updated 11 months ago
- Scaling Data-Constrained Language Modelsβ333Updated 6 months ago
- Large Context Attentionβ693Updated 2 months ago
- Code repository for Black Mambaβ241Updated last year
- A repository for research on medium sized language models.β493Updated 2 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793β398Updated 3 months ago
- Convolutions for Sequence Modelingβ878Updated 9 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.β405Updated 11 months ago
- The official implementation of βSophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingββ955Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorchβ318Updated 9 months ago
- A simple and effective LLM pruning approach.β723Updated 7 months ago
- Language Modeling with the H3 State Space Modelβ516Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β705Updated 5 months ago