RUCKBReasoning / LLM-Streamline
Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"
☆19Updated last week
Alternatives and similar repositories for LLM-Streamline:
Users that are interested in LLM-Streamline are comparing it to the libraries listed below
- ☆18Updated 4 months ago
- Official implementation for LaCo (EMNLP 2024 Findings)☆16Updated 6 months ago
- A block pruning framework for LLMs.☆22Updated 9 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆16Updated last month
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆120Updated last month
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆42Updated 4 months ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆11Updated last week
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…☆47Updated last year
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆77Updated 5 months ago
- ☆15Updated 5 months ago
- Awesome-Low-Rank-Adaptation☆93Updated 6 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆37Updated 10 months ago
- ☆10Updated last year
- ☆88Updated 3 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.☆40Updated 3 weeks ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆48Updated 2 weeks ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆54Updated 3 weeks ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆45Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆62Updated 2 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆56Updated last month
- Code for ACL 2024 accepted paper titled "SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language …☆34Updated 3 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆70Updated last month
- ☆39Updated 4 months ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆31Updated last year
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆44Updated 6 months ago
- Implementation code for ACL2024:Advancing Parameter Efficiency in Fine-tuning via Representation Editing☆13Updated 11 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆43Updated 5 months ago
- ☆50Updated this week
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆47Updated last year
- ☆11Updated last week