ThinK: Thinner Key Cache by Query-Driven Pruning
☆27Feb 11, 2025Updated last year
Alternatives and similar repositories for ThinK
Users that are interested in ThinK are comparing it to the libraries listed below
Sorting:
- Make reasoning models scalable☆47May 31, 2025Updated 9 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21May 28, 2024Updated last year
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆56May 2, 2025Updated 10 months ago
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆29Nov 22, 2025Updated 3 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆128Nov 26, 2025Updated 3 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆71Sep 18, 2025Updated 5 months ago
- ☆10Jan 16, 2025Updated last year
- TerDiT: Ternary Diffusion Models with Transformers☆74Jun 17, 2024Updated last year
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆22Feb 9, 2026Updated 3 weeks ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Apr 4, 2024Updated last year
- [AAAI 2025] Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks☆11Jun 19, 2025Updated 8 months ago
- Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning"☆11Feb 15, 2022Updated 4 years ago
- ☆38Feb 20, 2026Updated last week
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆14Dec 16, 2024Updated last year
- ☆18Mar 2, 2025Updated last year
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆13Jun 7, 2023Updated 2 years ago
- [ICLR 2025] ELICIT: LLM Augmentation Via External In-context Capability☆13Mar 11, 2025Updated 11 months ago
- Code of the paper "Synthesizing Aspect-Driven Recommendation Explanations from Reviews", IJCAI'20☆10Apr 5, 2024Updated last year
- Agent Memory Playground: AI Agent Memory Design & Optimization Techniques☆32Aug 7, 2025Updated 6 months ago
- Pytorch code of [CVPR 2023] "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction".☆11Mar 14, 2023Updated 2 years ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆114May 24, 2024Updated last year
- KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation, NAACL 2024☆16Jul 29, 2024Updated last year
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 4 months ago
- 🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.…☆12Feb 25, 2025Updated last year
- ☆13May 12, 2025Updated 9 months ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated 11 months ago
- Non-Autoregressive Math Word Problem Solver with Unified Tree Structure☆12Jan 13, 2024Updated 2 years ago
- SentimentScope☆10Oct 18, 2018Updated 7 years ago
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆82Jan 17, 2026Updated last month
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆19Oct 22, 2025Updated 4 months ago
- The official implementation of dLLM-Var☆31Nov 6, 2025Updated 3 months ago
- Neural Algorithmic Reasoning Tutorial☆12Dec 21, 2022Updated 3 years ago
- ☆12Sep 1, 2023Updated 2 years ago
- Implementation of "Learning Deep Generative Models"☆12Jun 4, 2019Updated 6 years ago
- ☆13Mar 9, 2024Updated last year
- Pytorch implementation of TPAMI 2022 -- 1xN Pattern for Pruning Convolutional Neural Networks☆42Sep 14, 2022Updated 3 years ago
- Customized Inference Engine for Multiverse Models☆24Jun 27, 2025Updated 8 months ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆18Apr 16, 2025Updated 10 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year