Infini-AI-Lab / gsm_infiniteLinks

☆55

Alternatives and similar repositories for gsm_infinite

Users that are interested in gsm_infinite are comparing it to the libraries listed below

Sorting:

sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆40Updated last month
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated last month
RobertCsordas / moeut
☆89Updated last year
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆105Updated last month
Infini-AI-Lab / Multiverse
☆104Updated 2 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆89Updated last year
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆126Updated 5 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆106Updated last month
SalesforceAIResearch / GemFilter
☆85Updated 3 weeks ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
ScalingIntelligence / large_language_monkeys
☆109Updated last year
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆62Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated 11 months ago
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆82Updated 4 months ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆57Updated last week
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 9 months ago
Infini-AI-Lab / S2FT
☆19Updated 11 months ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
wdlctc / mini-s
☆53Updated last year
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆31Updated 5 months ago
snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆63Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆91Updated 4 months ago
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 8 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆157Updated 7 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆99Updated 9 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆95Updated 3 weeks ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year