guanchuwang / Taylor-UnswiftLinks
☆22Updated last year
Alternatives and similar repositories for Taylor-Unswift
Users that are interested in Taylor-Unswift are comparing it to the libraries listed below
Sorting:
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆88Updated 10 months ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆122Updated 6 months ago
- Awesome list for LLM pruning.☆279Updated 3 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆26Updated 11 months ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆25Updated 6 months ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆36Updated 8 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆259Updated last year
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆17Updated last year
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆141Updated 5 months ago
- ☆49Updated last year
- ☆64Updated last year
- ☆10Updated last year
- a curated list of high-quality papers on resource-efficient LLMs 🌱☆154Updated 9 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆214Updated 10 months ago
- [ACL'25] Code for ACL'25 paper "IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory"☆24Updated 10 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆350Updated 8 months ago
- A curated list of early exiting (LLM, CV, NLP, etc)☆69Updated last year
- [ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆102Updated 6 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆47Updated last year
- Codebase for decoding compressed trust.☆25Updated last year
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆50Updated 9 months ago
- [NeurIPS 2024 / ICML 2025] LLM Quantization Attacks☆45Updated 4 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆123Updated last month
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆197Updated last month
- ☆23Updated last year
- ☆33Updated 9 months ago
- ☆299Updated 6 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆46Updated last year
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆88Updated 9 months ago
- Accepted LLM Papers in NeurIPS 2024☆37Updated last year