devvrit / matformerLinks

MatFormer repo

☆66

Alternatives and similar repositories for matformer

Users that are interested in matformer are comparing it to the libraries listed below

Sorting:

ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
RobertCsordas / moeut
☆89Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 9 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆136Updated 6 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
NVlabs / hymba
☆202Updated 11 months ago
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated last month
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆100Updated 8 months ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆100Updated 6 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
mistralai / mistral-evals
☆78Updated 2 weeks ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
minyoungg / LTE
☆69Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 7 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆102Updated last year
allenai / infinigram-api
☆87Updated this week
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
Zyphra / Zyda_processing
☆39Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 9 months ago
LLM360 / k2-train
☆52Updated last year
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆123Updated 10 months ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆58Updated this week