HazyResearch / m2Links
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆554Updated 5 months ago
Alternatives and similar repositories for m2
Users that are interested in m2 are comparing it to the libraries listed below
Sorting:
- Annotated version of the Mamba paper☆485Updated last year
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆409Updated 5 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆235Updated 2 weeks ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆230Updated 9 months ago
- Large Context Attention☆716Updated 4 months ago
- Understand and test language model architectures on synthetic tasks.☆217Updated 2 weeks ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆289Updated last year
- Language Modeling with the H3 State Space Model☆519Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆698Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆388Updated 11 months ago
- Scaling Data-Constrained Language Models☆335Updated 9 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆519Updated last month
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆644Updated 5 months ago
- The repository for the code of the UltraFastBERT paper☆516Updated last year
- A repository for research on medium sized language models.☆498Updated 2 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- ☆190Updated this week
- Multipack distributed sampler for fast padding-free training of LLMs☆191Updated 10 months ago
- Code repository for Black Mamba☆246Updated last year
- Official PyTorch implementation of QA-LoRA☆137Updated last year
- A bagel, with everything.☆321Updated last year
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆284Updated 2 weeks ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆257Updated last year
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆617Updated last year
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆640Updated 11 months ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆881Updated last month
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆421Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆337Updated 6 months ago
- Official implementation of TransNormerLLM: A Faster and Better LLM☆244Updated last year
- ☆223Updated last year