DerrickYLJ/LessIsMore

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DerrickYLJ/LessIsMore)

DerrickYLJ / LessIsMore

[ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

☆34

Alternatives and similar repositories for LessIsMore

Users that are interested in LessIsMore are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
ruipeterpan / specreason
View on GitHub
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆74Oct 2, 2025Updated 9 months ago
Infini-AI-Lab / Sparrow
View on GitHub
☆16Jun 15, 2026Updated last month
Infini-AI-Lab / Kinetics
View on GitHub
Kinetics: Rethinking Test-Time Scaling Laws
☆87Jul 11, 2025Updated last year
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆115Dec 2, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Infini-AI-Lab / vortex_torch
View on GitHub
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
☆68Jun 24, 2026Updated 3 weeks ago
humuyan / Korch
View on GitHub
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆41Mar 27, 2025Updated last year
gudiandian / ElasticFlow
View on GitHub
☆17May 10, 2024Updated 2 years ago
DS3Lab / AC-SGD
View on GitHub
Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.
☆29Apr 25, 2023Updated 3 years ago
tilde-research / nsa-release
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆133Jun 24, 2025Updated last year
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
ByteDance-Seed / ShadowKV
View on GitHub
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆310May 1, 2025Updated last year
furiosa-ai / draft-based-approx-llm
View on GitHub
[ICLR 2026] Draft-based Approximate Inference for LLMs
☆21Mar 10, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ahennequ / pytorch-custom-mma
View on GitHub
☆30Oct 3, 2022Updated 3 years ago
mit-han-lab / Quest
View on GitHub
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆400Jul 10, 2025Updated last year
microsoft / SeerAttention
View on GitHub
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆213Jul 10, 2026Updated last week
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
JungHoyoun / PromptCompressor
View on GitHub
☆12Apr 29, 2024Updated 2 years ago
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆295Updated this week
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Nov 21, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
StanfordLegion / gasnet
View on GitHub
☆13Jun 2, 2026Updated last month
YouAreSpecialToMe / QST
View on GitHub
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
☆49Nov 5, 2024Updated last year
princeton-pli / PruLong
View on GitHub
Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"
☆48Jul 29, 2025Updated 11 months ago
sands-lab / FOCUS
View on GitHub
[ICML 2026] Official implementation of "FOCUS: DLLMs Know How to Tame Their Compute Bound".
☆17May 5, 2026Updated 2 months ago
huangyuxiang03 / Locret
View on GitHub
☆14Oct 3, 2024Updated last year
quantum-compiler / quartz
View on GitHub
The Quartz Quantum Compiler
☆91Jun 9, 2026Updated last month
uw-mad-dash / Accordion
View on GitHub
Code for reproducing experiments performed for Accoridon
☆13Jun 11, 2021Updated 5 years ago
IST-DASLab / gemm-fp8
View on GitHub
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
☆21Jan 24, 2025Updated last year
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆139Nov 26, 2025Updated 7 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
letta-ai / letta-terminalbench
View on GitHub
letta integration for terminalbench (#1 open source agent, in under 200 lines of code)
☆19Oct 22, 2025Updated 9 months ago
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
FibonaccciYan / Adamas
View on GitHub
Adamas: Hadamard Sparse Attention for Efficient Long-context Inference
☆15May 19, 2026Updated 2 months ago
microsoft / RetrievalAttention
View on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
☆147Feb 22, 2026Updated 5 months ago
NVlabs / DLER
View on GitHub
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
☆17Nov 11, 2025Updated 8 months ago
Multiverse4FM / Multiverse
View on GitHub
☆88Jun 16, 2025Updated last year
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 6 months ago