HazyResearch / m2Links
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆557Updated 7 months ago
Alternatives and similar repositories for m2
Users that are interested in m2 are comparing it to the libraries listed below
Sorting:
- Annotated version of the Mamba paper☆487Updated last year
- The repository for the code of the UltraFastBERT paper☆517Updated last year
- ☆194Updated 2 weeks ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆388Updated last year
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆904Updated 3 months ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆649Updated 7 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆291Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆708Updated last year
- A repository for research on medium sized language models.☆509Updated 2 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆536Updated 3 months ago
- Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…☆226Updated last year
- Understand and test language model architectures on synthetic tasks.☆221Updated last month
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆413Updated 7 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆240Updated 2 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆433Updated 3 months ago
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆928Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆344Updated 8 months ago
- Official PyTorch implementation of QA-LoRA☆138Updated last year
- Code repository for the paper - "Matryoshka Representation Learning"☆541Updated last year
- Reference implementation of Megalodon 7B model☆524Updated 3 months ago
- Code repository for Black Mamba☆254Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆461Updated last year
- Scaling Data-Constrained Language Models☆339Updated last month
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆377Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆199Updated last year
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript☆599Updated last year
- ☆416Updated last year
- Large Context Attention☆727Updated 7 months ago
- Language Modeling with the H3 State Space Model☆519Updated last year
- Website for hosting the Open Foundation Models Cheat Sheet.☆267Updated 3 months ago