KV Cache Steering for Inducing Reasoning in Small Language Models
☆46Jul 24, 2025Updated 7 months ago
Alternatives and similar repositories for cache-steering
Users that are interested in cache-steering are comparing it to the libraries listed below
Sorting:
- ☆14Mar 20, 2025Updated 11 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- ☆15Jan 12, 2026Updated last month
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆16Jun 10, 2025Updated 8 months ago
- ☆19Jun 4, 2025Updated 8 months ago
- ☆15Apr 11, 2024Updated last year
- ☆21Jul 21, 2025Updated 7 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Dec 29, 2025Updated 2 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆49Updated this week
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆18Oct 17, 2025Updated 4 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- ☆12Apr 17, 2025Updated 10 months ago
- Fork of Flame repo for training of some new stuff in development☆19Feb 20, 2026Updated last week
- Algorithms for approximate attention in LLMs☆21Apr 14, 2025Updated 10 months ago
- Official Code for "Learning to Reason via Mixture-of-Thought for Logical Reasoning"☆26Nov 20, 2025Updated 3 months ago
- ☆21May 3, 2025Updated 10 months ago
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- ☆15Feb 21, 2024Updated 2 years ago
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 7 months ago
- ☆24Apr 3, 2025Updated 11 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆17Aug 1, 2025Updated 7 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 7 months ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆29Nov 22, 2025Updated 3 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆59Mar 17, 2025Updated 11 months ago
- Code for paper "Analog Foundation Models"☆30Sep 18, 2025Updated 5 months ago
- ☆45May 27, 2025Updated 9 months ago
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity (ACL 2025, oral)☆30Jun 14, 2025Updated 8 months ago
- The official implementation of our paper "CoRe^2: Collect, Reflect and Refine to Generate Better and Faster".☆30Mar 19, 2025Updated 11 months ago
- A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…☆73Nov 4, 2025Updated 3 months ago
- ☆23Sep 19, 2024Updated last year
- Differentiable Weightless Neural Networks☆33Feb 2, 2026Updated last month
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Oct 20, 2025Updated 4 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Esoteric Language Models☆111Feb 8, 2026Updated 3 weeks ago