SakanaAI/sparser-faster-llms

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SakanaAI/sparser-faster-llms)

SakanaAI / sparser-faster-llms

Cuda kernels for leveraging LLM sparsity to improve throughput and decrease the memory requirements during inference and training.

☆253

Alternatives and similar repositories for sparser-faster-llms

Users that are interested in sparser-faster-llms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chili-lab / LT2
View on GitHub
Official Codebase: LT2: Linear-Time Looped Transformers.
☆49May 27, 2026Updated last month
SakanaAI / DiffusionBlocks
View on GitHub
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
☆240Feb 18, 2026Updated 5 months ago
gouki510 / Analogy_in_Transformer
View on GitHub
☆34Feb 1, 2026Updated 5 months ago
Chengsong-Huang / G-Zero
View on GitHub
☆25May 14, 2026Updated 2 months ago
catswe / flash-attention-residuals
View on GitHub
Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)
☆86May 29, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NVlabs / GatedDeltaNet-2
View on GitHub
Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
☆246May 25, 2026Updated last month
mehdie79 / RTM_latent_refinement
View on GitHub
☆22Jul 10, 2026Updated last week
lilakk / how2everything
View on GitHub
Official code for "How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs"
☆24Feb 10, 2026Updated 5 months ago
znowu / CliqueFlowmer
View on GitHub
Code with CliqueFlowmer model for Optimal Computational Materials Discovery
☆17Apr 21, 2026Updated 3 months ago
IST-DASLab / MatGPTQ
View on GitHub
Code for MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
☆22Feb 18, 2026Updated 5 months ago
hustvl / MoDA
View on GitHub
An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".
☆274May 6, 2026Updated 2 months ago
Dao-AILab / gemm-cublas
View on GitHub
☆22May 5, 2025Updated last year
JanTempus / tokenisation_lp
View on GitHub
☆15May 20, 2026Updated 2 months ago
arberzela / pbt-nca
View on GitHub
Population Based Training of Petri Dish Neural Cellular Automata
☆16Apr 14, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tanishqkumar / ssd
View on GitHub
A lightweight inference engine supporting speculative speculative decoding (SSD).
☆970May 10, 2026Updated 2 months ago
IST-DASLab / Quartet-II
View on GitHub
Quartet II Official Code
☆76May 1, 2026Updated 2 months ago
test-time-training / e2e
View on GitHub
Official JAX implementation of End-to-End Test-Time Training for Long Context
☆625Feb 15, 2026Updated 5 months ago
chen-hao-chao / mdm-prime-v2
View on GitHub
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models
☆27May 23, 2026Updated last month
Dao-AILab / gram-newton-schulz
View on GitHub
Fast Polar Decomposition for Muon
☆166Jul 2, 2026Updated 2 weeks ago
Infini-AI-Lab / Sparrow
View on GitHub
☆16Jun 15, 2026Updated last month
facebookresearch / threadweaver
View on GitHub
The implementation for ThreadWeaver Adaptive Threading for Efficient Parallel Reasoning in Language Models
☆67Apr 8, 2026Updated 3 months ago
PKU-AICare / ConfAgents
View on GitHub
ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis
☆15Aug 4, 2025Updated 11 months ago
martin-marek / batch-size
View on GitHub
📄Small Batch Size Training for Language Models
☆82Mar 18, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
test-time-training / discover
View on GitHub
☆606May 24, 2026Updated last month
apple / ml-ssd
View on GitHub
☆796Apr 16, 2026Updated 3 months ago
martenlienen / bsi
View on GitHub
Generative Modeling with Bayesian Sample Inference
☆24May 17, 2025Updated last year
IST-DASLab / gptq-gguf-toolkit
View on GitHub
Efficient non-uniform quantization with GPTQ for GGUF
☆64Sep 17, 2025Updated 10 months ago
Embodied-Minds-Lab / BES
View on GitHub
We propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomp…
☆166May 28, 2026Updated last month
deep-spin / adasplash
View on GitHub
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆46May 20, 2026Updated 2 months ago
DunZhang / Jasper-Token-Compression-Training
View on GitHub
The training codes of Jasper-Token-Compression-600M
☆20Nov 19, 2025Updated 8 months ago
wdlctc / delta-attention-residuals-code
View on GitHub
Delta Attention Residuals - supplementary code and pretrained models
☆40May 20, 2026Updated 2 months ago
facebookresearch / sparse-delta-memory
View on GitHub
This repositories contains the reference implementation for the Sparse Delta Memory paper.More precisely, it contains the model definitio…
☆30Jul 9, 2026Updated last week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
SakanaAI / RLT
View on GitHub
Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.
☆363Jun 23, 2025Updated last year
ColfaxResearch / layout-categories
View on GitHub
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆139Sep 24, 2025Updated 9 months ago
camel-ai / gecko
View on GitHub
☆35Jul 8, 2026Updated last week
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
metauto-ai / NeuralComputer
View on GitHub
🖥 Neural Computers' Data Engine
☆201May 19, 2026Updated 2 months ago
inclusionAI / Ring-V2.5
View on GitHub
☆45Feb 28, 2026Updated 4 months ago
rimads / avey-b
View on GitHub
Code for the Avey-B paper (https://arxiv.org/abs/2602.15814)
☆32Feb 21, 2026Updated 4 months ago