HazyResearch / m2Links

Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"

☆560

Alternatives and similar repositories for m2

Users that are interested in m2 are comparing it to the libraries listed below

Sorting:

dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆291Updated last year
srush / annotated-mamba
Annotated version of the Mamba paper
☆489Updated last year
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆518Updated last year
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆388Updated last year
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆722Updated last year
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆226Updated last year
microsoft / Samba
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
☆915Updated 5 months ago
lucidrains / MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆653Updated 9 months ago
llm-random / llm-random
☆200Updated last month
mlfoundations / open_lm
A repository for research on medium sized language models.
☆514Updated 4 months ago
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆417Updated 9 months ago
RAIVNLab / MRL
Code repository for the paper - "Matryoshka Representation Learning"
☆569Updated last year
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆542Updated 5 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated last month
allenai / fm-cheatsheet
Website for hosting the Open Foundation Models Cheat Sheet.
☆267Updated 5 months ago
redotvideo / mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
☆933Updated last year
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆465Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆382Updated last year
sanderwood / bgpt
Beyond Language Models: Byte Models are Digital World Simulators
☆329Updated last year
haoliuhl / ringattention
Large Context Attention
☆743Updated last week
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆385Updated last year
Zyphra / BlackMamba
Code repository for Black Mamba
☆257Updated last year
XuezheMax / megalodon
Reference implementation of Megalodon 7B model
☆522Updated 5 months ago
SkunkworksAI / hydra-moe
☆415Updated last year
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆141Updated last year
HazyResearch / safari
Convolutions for Sequence Modeling
☆900Updated last year
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆976Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆201Updated last year