NoakLiu / LLMEasyQuant
An Easy-to-Use Toolkit for LLM Quantization on can be executed on Macbook [Efficient ML Model]
☆18Updated 4 months ago
Alternatives and similar repositories for LLMEasyQuant:
Users that are interested in LLMEasyQuant are comparing it to the libraries listed below
- GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]☆30Updated 5 months ago
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆11Updated 2 months ago
- Accelerating Embedding Training on Multitask Scenario [Efficient ML Model]☆11Updated 4 months ago
- Efficient-Large-Foundation-Model-Inference: A-Perspective-From-Model-and-System-Co-Design [Efficient ML System & Model]☆24Updated 2 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆39Updated 11 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆248Updated 7 months ago
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…☆47Updated last year
- DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.com/p/1218643…☆16Updated 2 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆37Updated 10 months ago
- ☆8Updated 8 months ago
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆45Updated 4 months ago
- [ECCV 2024] AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer☆26Updated 4 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆16Updated 4 months ago
- The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"☆39Updated 3 weeks ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆49Updated last year
- ☆10Updated last year
- ☆21Updated 5 months ago
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆30Updated last month
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆19Updated last year
- ☆14Updated 2 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆58Updated last month
- ☆43Updated 6 months ago
- ☆41Updated 10 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆21Updated last month
- ☆50Updated 4 months ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆25Updated last week
- ☆12Updated last year
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆41Updated last month
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆36Updated 8 months ago