The evaluation framework for training-free sparse attention in LLMs
☆122Jan 27, 2026Updated 3 months ago
Alternatives and similar repositories for sparse-frontier
Users that are interested in sparse-frontier are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 10 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆277Jul 6, 2025Updated 10 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆168Oct 13, 2025Updated 7 months ago
- ☆20Mar 11, 2025Updated last year
- The Official Implementation of Ada-KV [NeurIPS 2025]☆131Nov 26, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- KV cache compression for high-throughput LLM inference☆157Feb 5, 2025Updated last year
- Research work aimed at addressing the problem of modeling infinite-length context☆48Dec 18, 2025Updated 5 months ago
- ☆138May 29, 2025Updated 11 months ago
- This is the official repo for the paper "Accelerating Parallel Sampling of Diffusion Models" Tang et al. ICML 2024 https://openreview.net…☆16Jul 19, 2024Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆133Jun 24, 2025Updated 11 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆86Jan 12, 2025Updated last year
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)☆28Feb 26, 2026Updated 3 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆479May 17, 2025Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆150Feb 25, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆135Feb 22, 2026Updated 3 months ago
- Fork of Flame repo for training of some new stuff in development☆19Apr 24, 2026Updated last month
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆299May 1, 2025Updated last year
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆43Mar 28, 2026Updated last month
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆57Mar 31, 2026Updated last month
- Customized Inference Engine for Multiverse Models☆25Jun 27, 2025Updated 10 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆541Feb 10, 2025Updated last year
- ☆20May 30, 2024Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆22Mar 31, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆43Oct 11, 2025Updated 7 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆22Sep 19, 2024Updated last year
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆60Nov 20, 2024Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆113Oct 11, 2025Updated 7 months ago
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆17Sep 15, 2024Updated last year
- ☆248Nov 19, 2025Updated 6 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆32Feb 25, 2025Updated last year
- Showing how to use CUDA on google colab☆13Feb 24, 2025Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆93Sep 12, 2025Updated 8 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14Oct 3, 2024Updated last year
- a collection of skills for vllm-omni☆67Updated this week
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆30Jul 24, 2025Updated 10 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 3 years ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆64Nov 11, 2025Updated 6 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆376Dec 12, 2024Updated last year
- Understanding deep networks and large models.☆28Jan 23, 2026Updated 4 months ago