csguoh/OBR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/csguoh/OBR)

csguoh / OBR

[ICLR2026] The first W4A4KV4 quantized + 50% sparse LLMs!

☆33

Alternatives and similar repositories for OBR

Users that are interested in OBR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago
SamsungSAILMontreal / ream
View on GitHub
REAM: Merging Improves Pruning of Experts in LLMs
☆22Apr 16, 2026Updated 3 months ago
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆94Apr 8, 2025Updated last year
IST-DASLab / FP-Quant
View on GitHub
☆116Feb 26, 2026Updated 5 months ago
yangyifei729 / LaCo
View on GitHub
Official implementation for LaCo (EMNLP 2024 Findings)
☆22Oct 3, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xuyang-liu16 / MixKV
View on GitHub
[ICLR 2026] Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
☆29Mar 21, 2026Updated 4 months ago
JingyangXiang / DFRot
View on GitHub
[COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎：https://zhuanlan.zhihu.c…
☆30Mar 5, 2025Updated last year
wazenmai / HC-SMoE
View on GitHub
[ICML 2025] Retraining-Free Merging of Sparse MoE via Hierarchical Clustering
☆25Oct 26, 2025Updated 9 months ago
lluckydog / blockchainlab2023
View on GitHub
☆13May 18, 2024Updated 2 years ago
zhuohaoyu / ORPS
View on GitHub
☆16Jul 15, 2025Updated last year
Hsu1023 / DuQuant
View on GitHub
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
☆187Apr 24, 2026Updated 3 months ago
yuxwind / CBS
View on GitHub
Official Code of The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks[ICML2022]
☆16Sep 20, 2022Updated 3 years ago
gty111 / gLLM
View on GitHub
An Efficient and Versatile Inference Engine for Distributed LLM Serving
☆66Updated this week
wangjiangshan0725 / Elastic-DiT
View on GitHub
[ICML 2026] Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computatio…
☆49May 1, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
csguoh / IntLoRA
View on GitHub
[ICML2025] LoRA fine-tune directly on the INT4 models.
☆41Nov 25, 2024Updated last year
csguoh / AdaptIR
View on GitHub
[NeurIPS2024] Tune your restoration model with one 3090 GPU!
☆90Jan 13, 2025Updated last year
XingtongGe / Salt
View on GitHub
🧂 [ECCV 2026] Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
☆16Apr 6, 2026Updated 3 months ago
JHW2000 / JARNet
View on GitHub
A Novel Linear Array Pushbroom (LAP) Image Restoration Method. (Accepted by AAAI 2024)
☆12Jan 17, 2024Updated 2 years ago
parsa-epfl / quantization-sparsity-interplay
View on GitHub
This repo contains the code for studying the interplay between quantization and sparsity methods
☆26Feb 26, 2025Updated last year
StarDewXXX / O1-Pruner
View on GitHub
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆100Feb 21, 2025Updated last year
duterscmy / CD-MoE
View on GitHub
Official PyTorch implementation of CD-MOE
☆12Mar 18, 2026Updated 4 months ago
ThisisBillhe / ZipAR
View on GitHub
[ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…
☆51Mar 25, 2025Updated last year
ims-kdks / Learning-to-Parallel-Decoding
View on GitHub
[ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
☆34Jan 27, 2026Updated 6 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
zysxmu / DFSQ
View on GitHub
super-resolution; post-training quantization; model compression
☆14Nov 10, 2023Updated 2 years ago
zkkli / HTQ
View on GitHub
[PR 2024] HTQ: Exploring the High-Dimensional Trade-Off of Mixed-Precision Quantization
☆12Jul 16, 2024Updated 2 years ago
A-suozhang / ViDiT-Q
View on GitHub
☆15Mar 21, 2025Updated last year
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
ZIB-IOL / SMS
View on GitHub
Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"
☆12Oct 14, 2025Updated 9 months ago
RUCBM / DelTA
View on GitHub
Code for Paper 'DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards'
☆17May 21, 2026Updated 2 months ago
IST-DASLab / gptq-gguf-toolkit
View on GitHub
Efficient non-uniform quantization with GPTQ for GGUF
☆64Sep 17, 2025Updated 10 months ago
seisman / academic-homepage
View on GitHub
My academic homepage
☆15Jan 15, 2022Updated 4 years ago
luuyin / OWL
View on GitHub
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆82Jul 7, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Xtra-Computing / XtraMAC
View on GitHub
XtraMAC code repo (Accepted by ISCA2026)
☆17May 6, 2026Updated 2 months ago
lliai / EMQ-series
View on GitHub
[ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
☆29Dec 6, 2023Updated 2 years ago
zysxmu / UnDeM
View on GitHub
image demoireing, moire synthesis
☆17Apr 25, 2024Updated 2 years ago
StiphyJay / MQuant
View on GitHub
[ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
☆44Aug 13, 2025Updated 11 months ago
SpRegTiling / sparse-register-tiling
View on GitHub
☆10Mar 2, 2024Updated 2 years ago
chase6305 / 7DofSRSKinematics
View on GitHub
Kinematics analytical solution and inverse solution for KUKA IIWA 7DOF robot.
☆15Jan 13, 2025Updated last year
BICLab / MetaLA
View on GitHub
Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)
☆36Jan 18, 2025Updated last year