skylight-org/sparse-attention-hub

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/skylight-org/sparse-attention-hub)

skylight-org / sparse-attention-hub

Advancing the frontier of efficient AI

☆67

Alternatives and similar repositories for sparse-attention-hub

Users that are interested in sparse-attention-hub are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xAlg-ai / HashAttention-1.0
View on GitHub
☆18Sep 23, 2025Updated 10 months ago
mert-cemri / autoevolve
View on GitHub
☆24Dec 6, 2025Updated 7 months ago
Jingyu6 / speculative_prefill
View on GitHub
☆63May 19, 2025Updated last year
FrontierCS / Frontier-CS
View on GitHub
A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.
☆288Updated this week
SakanaAI / fast-weight-product-key-memory
View on GitHub
Code for Fast-weight Product Key Memory (FwPKM)
☆19Mar 18, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
togethercomputer / saw-int4
View on GitHub
Official implementation of Paper "System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving"
☆30Apr 17, 2026Updated 3 months ago
StarTrail-org / RAG-DS-Serve
View on GitHub
[AAAI26]: DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval
☆53Jan 28, 2026Updated 6 months ago
ruipeterpan / specreason
View on GitHub
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆74Oct 2, 2025Updated 9 months ago
SqueezeAILab / MultipoleAttention
View on GitHub
[NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning
☆24Dec 5, 2025Updated 7 months ago
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
az1326 / advisor-models
View on GitHub
How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models
☆82Feb 5, 2026Updated 5 months ago
MindLab-Research / longstraw
View on GitHub
MinT-2M: Long-context training system for resident-prefix GRPO
☆39Updated this week
embedl / embedl-models
View on GitHub
⛔ DEPRECATED -- use flash-head instead (pip install flash-head)
☆29Apr 10, 2026Updated 3 months ago
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆116Dec 2, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Infini-AI-Lab / STEM
View on GitHub
☆66May 7, 2026Updated 2 months ago
AlexCuadron / ThinkingAgent
View on GitHub
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆103May 16, 2025Updated last year
utah-scs / lsm-sim
View on GitHub
Simulator for comparing memory allocation policies for caches.
☆20May 15, 2019Updated 7 years ago
thad0ctor / KrunchWrapper
View on GitHub
☆18Jul 1, 2025Updated last year
XuezheMax / gecko-llm
View on GitHub
Gecko Architecture
☆16Jan 13, 2026Updated 6 months ago
guestrin-lab / deepscholar
View on GitHub
build and benchmark deep research
☆245Mar 28, 2026Updated 4 months ago
NVIDIA / kvpress
View on GitHub
LLM KV cache compression made easy
☆1,147Updated this week
microsoft / glinthawk
View on GitHub
An LLM inference engine, written in C++
☆20Mar 30, 2026Updated 3 months ago
answers111 / alpha-research
View on GitHub
Repo for "AlphaResearch: Accelerating New Algorithm Discovery with Language Models"
☆58Nov 12, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 8 months ago
netaz / caffe2any
View on GitHub
a few utilities to analyze Caffe prototxt files
☆16Sep 27, 2017Updated 8 years ago
skydiscover-ai / skydiscover
View on GitHub
AI-Driven Scientific and Algorithmic Discovery
☆590Jun 14, 2026Updated last month
mit-han-lab / flash-moba
View on GitHub
☆251Nov 19, 2025Updated 8 months ago
mlc-ai / package
View on GitHub
☆14Updated this week
RadicalNumerics / spear
View on GitHub
Structured Primitives for Efficient Architecture Research
☆20Dec 22, 2025Updated 7 months ago
jacobfa / mot
View on GitHub
☆15Sep 25, 2025Updated 10 months ago
apd10 / universal_memory_allocation
View on GitHub
☆15Apr 26, 2022Updated 4 years ago
HarmanDotpy / pairwise-self-verification
View on GitHub
[ICML 2026] Code for V1: Unifying Generation and Self-Verification for Parallel Reasoners.
☆39Mar 5, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
FasterDecoding / SnapKV
View on GitHub
☆327Jul 10, 2025Updated last year
Danielohayon / Block-Sparse-Flash-Attention
View on GitHub
☆34Dec 10, 2025Updated 7 months ago
Hanchenli / vllm-continuum
View on GitHub
Preview Code for Continuum Paper
☆91Jul 20, 2026Updated last week
badrobotics / FeRTOS
View on GitHub
FeRTOS is a simple "operating system" that currently supports ARM Cortex-M CPUs
☆12Jul 9, 2022Updated 4 years ago
xie-lab-ml / piecewise-sparse-attention
View on GitHub
Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers
☆37Jul 1, 2026Updated 3 weeks ago
open-lm-engine / lm-engine
View on GitHub
LM engine is a library for pretraining/finetuning LLMs
☆184Updated this week
FlashML-org / flashlib
View on GitHub
Fast and memory-efficient classical machine learning operators
☆548Updated this week