Paramathic/slim

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Paramathic/slim)

Paramathic / slim

SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)

☆36

Alternatives and similar repositories for slim

Users that are interested in slim are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

z-lab / sparselora
View on GitHub
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆76Mar 10, 2026Updated 4 months ago
thu-ml / Adaptive-Sparse-Trainer
View on GitHub
Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)
☆19Jul 1, 2025Updated last year
bupt-ai-club / llm-compression-papers
View on GitHub
papers of llm compression
☆13Mar 6, 2024Updated 2 years ago
IST-DASLab / EvoPress
View on GitHub
☆43Jun 14, 2026Updated last month
ChengZhang-98 / LQER
View on GitHub
Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"
☆19Jul 11, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
osehmathias / lisa
View on GitHub
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
☆38Apr 4, 2024Updated 2 years ago
Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
StarDewXXX / Awesome-Hybrid-CoT-Reasoning
View on GitHub
☆62Jun 7, 2025Updated last year
SpRegTiling / sparse-register-tiling
View on GitHub
☆10Mar 2, 2024Updated 2 years ago
stephenqz / OATS
View on GitHub
Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition
☆20Apr 16, 2025Updated last year
pprp / Pruner-Zero
View on GitHub
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆100Nov 25, 2024Updated last year
HuangOwen / RoLoRA
View on GitHub
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆40Sep 24, 2024Updated last year
AAzdi / Sparse-BitNet
View on GitHub
☆15Mar 10, 2026Updated 4 months ago
ROIM1998 / APT
View on GitHub
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
☆48Jun 4, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ArminAzizi98 / LaMDA
View on GitHub
☆15Nov 7, 2024Updated last year
LinkAnonymous / BESA
View on GitHub
☆12Oct 9, 2023Updated 2 years ago
Cattalyya / 3DCoMPaT-challenge
View on GitHub
A repo for publishing solution to 3DCoMPaT++ challenge on an improved large-scale 3D vision dataset for compositional recognition
☆14Jun 22, 2023Updated 3 years ago
microsoft / Moonlit
View on GitHub
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
☆88Oct 25, 2024Updated last year
imagination-research / LCSC
View on GitHub
[ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
☆16Feb 15, 2025Updated last year
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
ThisisBillhe / EfficientDM
View on GitHub
[ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…
☆73Jun 4, 2024Updated 2 years ago
thu-nics / MBQ
View on GitHub
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
☆93Mar 17, 2025Updated last year
StarDewXXX / O1-Pruner
View on GitHub
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆99Feb 21, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mit-han-lab / sparserefine
View on GitHub
[ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentation
☆16Jan 10, 2025Updated last year
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
spcl / spatial-collectives
View on GitHub
Optimized communication collectives for the Cerebras waferscale engine
☆17Jun 5, 2024Updated 2 years ago
Ratbuyer / h100-features
View on GitHub
☆18Mar 12, 2025Updated last year
bytedance / AffineQuant
View on GitHub
Official implementation of the ICLR 2024 paper AffineQuant
☆30Mar 30, 2024Updated 2 years ago
VITA-Group / R-Sparse
View on GitHub
[ICLR'25] R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
☆21Apr 28, 2025Updated last year
Intelligent-Computing-Lab-Panda / TesseraQ
View on GitHub
☆25Oct 31, 2024Updated last year
z-lab / flash-colreduce
View on GitHub
Fast, memory-efficient attention column reduction (e.g., sum, mean, max)
☆49Feb 10, 2026Updated 5 months ago
IST-DASLab / Sparse-Marlin
View on GitHub
Boosting 4-bit inference kernels with 2:4 Sparsity
☆96Sep 4, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
enyac-group / UniQL
View on GitHub
UniQL official repository (ICLR 2026)
☆16Jan 27, 2026Updated 5 months ago
AI-Efficiency / IR-QLoRA
View on GitHub
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆65Apr 15, 2024Updated 2 years ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 7 months ago
xzhang9308 / GLVQ
View on GitHub
[NeurIPS 2025] Official PyTorch implementation of paper "Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression".
☆15Oct 24, 2025Updated 8 months ago
mbalesni / deepspeed_llama
View on GitHub
Finetuning LLaMA with DeepSpeed
☆10Apr 14, 2023Updated 3 years ago
anonymouscvpr1983 / GAL
View on GitHub
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
☆18Mar 23, 2019Updated 7 years ago
yxli2123 / LoSparse
View on GitHub
☆64Oct 17, 2023Updated 2 years ago