lliu606 / COSMOSLinks

☆14

Alternatives and similar repositories for COSMOS

Users that are interested in COSMOS are comparing it to the libraries listed below

Sorting:

RobertCsordas / moeut
☆89Updated last year
berlino / seq_icl
☆53Updated last year
wdlctc / mini-s
☆53Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated last year
JacobPfau / fillerTokens
☆75Updated last year
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆52Updated last month
ScalingIntelligence / large_language_monkeys
☆110Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆77Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆79Updated last year
jxiw / MambaByte
[CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model
☆24Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆157Updated 8 months ago
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆31Updated 9 months ago
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated 3 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆106Updated 2 months ago
apple / ml-dataset-decomposition
Official repo of dataset-decomposition paper [NeurIPS 2024]
☆20Updated 11 months ago
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆38Updated last month
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆86Updated last year
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 9 months ago
dmis-lab / Monet
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆74Updated 5 months ago
BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆105Updated last year
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆40Updated 2 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆62Updated 5 months ago
recursal / RADLADS-paper
RADLADS training code
☆35Updated 7 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆180Updated 5 months ago
Lagooon / LeanSTaR
☆42Updated last year
StigLidu / TURN
[ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"
☆21Updated 10 months ago