horseee/LLaMA-Pruning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/horseee/LLaMA-Pruning)

horseee / LLaMA-Pruning

Structural Pruning for LLaMA

☆54

Alternatives and similar repositories for LLaMA-Pruning

Users that are interested in LLaMA-Pruning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JngwenYe / LIRF
View on GitHub
Code for ECCV 2022 paper “Learning with Recoverable Forgetting”
☆21Jul 27, 2022Updated 3 years ago
OliverRensu / DeepMIM
View on GitHub
[WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling
☆56May 10, 2025Updated last year
samuelbroscheit / wikiextractor-wikimentions
View on GitHub
A tool for extracting plain text and internal Wikipedia links from Wikipedia dumps
☆11Apr 18, 2019Updated 7 years ago
DongPoLI / NMS_SoftNMS
View on GitHub
Pytorch、Numpy实现NMS、Soft-NMS代码
☆12Mar 22, 2021Updated 5 years ago
horseee / LLM-Pruner
View on GitHub
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…
☆1,133Oct 7, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
horseee / CoT-Valve
View on GitHub
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆91Feb 14, 2025Updated last year
Adamdad / Repfusion
View on GitHub
☆60Oct 6, 2023Updated 2 years ago
XiPotatonium / pnr
View on GitHub
Accepted at IJCAI-2022
☆11Sep 3, 2022Updated 3 years ago
Jiang-Yidi / FlatTrajectoryDistillation_FTD
View on GitHub
The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)
☆18Mar 21, 2023Updated 3 years ago
Yuanshi9815 / LiteFocus
View on GitHub
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
☆34Mar 11, 2025Updated last year
tsa18 / ConciseHint
View on GitHub
[Preprint arXiv: 2506.18810 ] ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation
☆26Oct 1, 2025Updated 9 months ago
Jiang-Yidi / TransformerDistillation-SLU
View on GitHub
☆13Nov 25, 2021Updated 4 years ago
lachlansneff / sparsellama
View on GitHub
☆40Mar 25, 2023Updated 3 years ago
Huage001 / Transfer-Any-Style
View on GitHub
An interactive demo based on Segment-Anything for style transfer which enables different content regions apply different styles.
☆101Apr 24, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Lexie-YU / ViFeEdit
View on GitHub
[Preprint] ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
☆67Mar 31, 2026Updated 3 months ago
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
Alibaba-NLP / MuVER
View on GitHub
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations
☆31May 23, 2022Updated 4 years ago
sramshetty / ShortGPT
View on GitHub
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆78Apr 29, 2024Updated 2 years ago
yu-rp / NeuralLineage
View on GitHub
Code for CVPR 2024 Oral "Neural Lineage"
☆17Jun 18, 2024Updated 2 years ago
YukeWang96 / SGQuant
View on GitHub
SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization
☆11Aug 12, 2020Updated 5 years ago
jongwooko / NASH-Pruning-Official
View on GitHub
Code Implementation for "NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models" (EMNLP …
☆17Oct 17, 2023Updated 2 years ago
Oneflow-Inc / OneFlow-Pruning
View on GitHub
[CVPR-2023] Towards Any Structural Pruning
☆18Apr 27, 2023Updated 3 years ago
horseee / DeepCache
View on GitHub
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
☆970Jun 27, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
MahtabShaan / autoencoder-as-feature-extractor-CIFAR-10
View on GitHub
CIFAR-10 image classification of imbalanced data using bottleneck features extracted from the autoencoder.
☆10Sep 26, 2019Updated 6 years ago
BaiTheBest / SparseLLM
View on GitHub
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
☆70Mar 27, 2025Updated last year
Adamdad / vico
View on GitHub
Vico: Compositional Video Generation as Flow Equalization
☆59Nov 15, 2024Updated last year
OpenNLPLab / TransnormerLLM
View on GitHub
Official implementation of TransNormerLLM: A Faster and Better LLM
☆255Jan 23, 2024Updated 2 years ago
VainF / Torch-Pruning
View on GitHub
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
☆3,328Sep 7, 2025Updated 10 months ago
florinshen / Vista3D
View on GitHub
[ECCV2024] Vista3D: Unravel the 3D Darkside of a Single Image
☆57Sep 19, 2024Updated last year
ycjing / AmalgamateGNN.PyTorch
View on GitHub
PyTorch implementation of AmalgamateGNN (CVPR'21)
☆21Jul 29, 2022Updated 3 years ago
vineeths96 / Gradient-Compression
View on GitHub
We present a set of all-reduce compatible gradient compression algorithms which significantly reduce the communication overhead while mai…
☆10Nov 14, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
VainF / Isomorphic-Pruning
View on GitHub
[ECCV 2024] Isomorphic Pruning for Vision Models
☆89Jul 23, 2024Updated 2 years ago
yu-rp / Dimple
View on GitHub
Dimple, the first Discrete Diffusion Multimodal Large Language Model
☆117Jul 9, 2025Updated last year
bytedance / DQ-Det
View on GitHub
Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
☆38Sep 12, 2023Updated 2 years ago
VITA-Group / llm-kick
View on GitHub
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
☆27Apr 21, 2025Updated last year
minicheshire / Robust-Prefix-Tuning
View on GitHub
code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification
☆27Mar 21, 2022Updated 4 years ago
zhuyunqi96 / LoraLPrun
View on GitHub
☆13May 21, 2023Updated 3 years ago
chesterzhang / Data-Structures-and-Algorithms
View on GitHub
My Data Structures and Algorithms tutorials and repository.
☆10Feb 19, 2022Updated 4 years ago