ThinK: Thinner Key Cache by Query-Driven Pruning
☆30Jun 2, 2026Updated last month
Alternatives and similar repositories for ThinK
Users that are interested in ThinK are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Make reasoning models scalable☆51Jun 2, 2026Updated last month
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆22May 28, 2024Updated 2 years ago
- VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models☆26Mar 26, 2025Updated last year
- PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation [NeurIPS 2025]☆19Oct 11, 2025Updated 8 months ago
- [ACL Findings 2026] Official Implementation of "FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acc…☆32Apr 14, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- TerDiT: Ternary Diffusion Models with Transformers☆76Jun 17, 2024Updated 2 years ago
- ☆116Jan 11, 2026Updated 5 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆137Nov 26, 2025Updated 7 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆72Sep 18, 2025Updated 9 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆38Jul 11, 2024Updated last year
- 关于AI,ML,DA,DV等的几个经典案例,包括堵车模拟(NagelSchreckenberg)、蒙特卡洛排队问题(Monte Carlo Queuing Problem)、人脸识别(RecognitionFace)、遗传算法推断图像(IconGenetic)☆10Oct 14, 2018Updated 7 years ago
- ☆33Feb 8, 2026Updated 4 months ago
- vortex particles for simulating smoke in 2d☆17Dec 13, 2021Updated 4 years ago
- ☆14Mar 11, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆23Mar 2, 2025Updated last year
- CFG-GAN: Composite functional gradient learning of generative adversarial models☆15Jul 9, 2020Updated 5 years ago
- ☆49May 9, 2026Updated last month
- Pytorch implementation of TPAMI 2022 -- 1xN Pattern for Pruning Convolutional Neural Networks☆42Sep 14, 2022Updated 3 years ago
- ☆13Mar 28, 2025Updated last year
- Extension of libSVM to support Open Set Recognitoin as described in "Toward Open Set Recognition", TPAMI July 2013☆12Oct 21, 2013Updated 12 years ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆38Apr 4, 2024Updated 2 years ago
- Papers of Implicit Reasoning in LLMs.☆25Mar 13, 2025Updated last year
- [NeurIPS 2024] Search for Efficient LLMs☆16Jan 16, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- ☆12May 15, 2025Updated last year
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆87Jan 17, 2026Updated 5 months ago
- My record about learning the course MIT-6.824☆13Mar 28, 2022Updated 4 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆67Sep 28, 2024Updated last year
- An Empirical Study of Memorization in NLP (ACL 2022)☆13Jun 22, 2022Updated 4 years ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆78Apr 29, 2024Updated 2 years ago
- Hinton's Forward-Forward Algorithm Implementation PyTorch☆12Sep 4, 2023Updated 2 years ago
- This is the model zoo for our CVPR 2023 paper: EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention☆14Mar 13, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆12Sep 1, 2023Updated 2 years ago
- ☆30Sep 23, 2025Updated 9 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆36Jun 2, 2026Updated last month
- Spectral Graph Attention Network with Fast Eigen-approximation☆11Dec 24, 2021Updated 4 years ago
- Official Implementation (Pytorch) of the "Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation", EMNLP 2024 (main…☆12Mar 10, 2025Updated last year
- The codes are for the paper: ``Complete Dictionary Learning via \ell_p-norm Maximization'',Yifei Shen∗ , Ye Xue∗ , Jun Zhang , Khaled B. …☆11Nov 21, 2020Updated 5 years ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated last year