mit-han-lab / pruning-sparsity-publicationsLinks

☆24

Alternatives and similar repositories for pruning-sparsity-publications

Users that are interested in pruning-sparsity-publications are comparing it to the libraries listed below

Sorting:

SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆115Updated 2 months ago
tiingweii-shii / Awesome-Resource-Efficient-LLM-Papers
a curated list of high-quality papers on resource-efficient LLMs 🌱
☆139Updated 6 months ago
UbiquitousLearning / Paper-list-resource-efficient-large-language-model
☆100Updated last year
IST-DASLab / OBC
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
☆126Updated 2 years ago
parsa-epfl / quantization-sparsity-interplay
This repo contains the code for studying the interplay between quantization and sparsity methods
☆23Updated 7 months ago
1hunters / LIMPQ
Official implementation for ECCV 2022 paper LIMPQ - "Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance"
☆61Updated 2 years ago
mit-han-lab / parallel-computing-tutorial
☆172Updated 2 years ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆222Updated 2 years ago
Hao840 / Awesome-Low-Precision-Training
A collection of research papers on low-precision training methods
☆37Updated 4 months ago
thu-nics / qllm-eval
Code Repository of Evaluating Quantized Large Language Models
☆131Updated last year
xvyaward / owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆66Updated last year
IntelLabs / Hardware-Aware-Automated-Machine-Learning
☆68Updated last month
pprp / Awesome-LLM-Quantization
Awesome list for LLM quantization
☆309Updated this week
Qualcomm-AI-research / FP8-quantization
☆159Updated 2 years ago
mit-han-lab / tinychat-tutorial
☆72Updated 10 months ago
UbiquitousLearning / Efficient_Foundation_Model_Survey
Survey Paper List - Efficient LLM and Foundation Models
☆257Updated last year
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆57Updated 6 months ago
GATECH-EIC / Edge-LLM
[DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…
☆70Updated last year
Kyrie-Zhao / awesome-real-time-AI
This is a list of awesome edgeAI inference related papers.
☆98Updated last year
microsoft / chunk-attention
☆78Updated 5 months ago
imagination-research / EEP
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆18Updated 9 months ago
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆233Updated 3 weeks ago
falcon-xu / early-exit-papers
A curated list of early exiting (LLM, CV, NLP, etc)
☆62Updated last year
Qualcomm-AI-research / transformer-quantization
☆206Updated 3 years ago
naver-aics / lut-gemm
☆71Updated last year
hyhuang00 / moe_inference
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Updated 10 months ago
FMInference / DejaVu
☆338Updated last year
INT-FlashAttention2024 / INT-FlashAttention
☆82Updated 8 months ago
wimh966 / outlier_suppression
The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…
☆48Updated 2 years ago
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆45Updated last year