devvrit / matformerLinks
MatFormer repo
β47Updated 7 months ago
Alternatives and similar repositories for matformer
Users that are interested in matformer are comparing it to the libraries listed below
Sorting:
- DPO, but faster πβ43Updated 7 months ago
- A repository for research on medium sized language models.β77Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276β28Updated 2 months ago
- Train, tune, and infer Bamba modelβ130Updated last month
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"β98Updated 9 months ago
- β82Updated 10 months ago
- GoldFinch and other hybrid transformer componentsβ46Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ60Updated 10 months ago
- β35Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language modelsβ81Updated last month
- Collection of autoregressive model implementationβ85Updated 2 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,β¦β47Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)β158Updated 3 months ago
- β59Updated 3 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)β33Updated 4 months ago
- β48Updated 8 months ago
- β68Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β36Updated last year
- RWKV-7: Surpassing GPTβ92Updated 7 months ago
- MEXMA: Token-level objectives improve sentence representationsβ41Updated 6 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the bestβ¦β49Updated 3 months ago
- β48Updated 10 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ127Updated 10 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.β64Updated last year
- This is the official repository for Inheritune.β111Updated 5 months ago
- [NeurIPS 2024] Low rank memory efficient optimizer without SVDβ30Updated 2 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β59Updated 9 months ago
- β45Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIMβ55Updated last year
- β81Updated last year