[ICLR2026] The first W4A4KV4 quantized + 50% sparse LLMs!
☆26Jan 26, 2026Updated 3 months ago
Alternatives and similar repositories for OBR
Users that are interested in OBR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation for LaCo (EMNLP 2024 Findings)☆21Oct 3, 2024Updated last year
- [ICML2025] LoRA fine-tune directly on the INT4 models.☆40Nov 25, 2024Updated last year
- [CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglom…☆45Mar 7, 2026Updated last month
- Implementation of the paper 'Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance' (EMNLP 2025)☆28Dec 16, 2025Updated 4 months ago
- ☆33Jun 7, 2025Updated 10 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official Code of The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks[ICML2022]☆16Sep 20, 2022Updated 3 years ago
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated last month
- This repo contains the code for studying the interplay between quantization and sparsity methods☆26Feb 26, 2025Updated last year
- EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI☆25Jan 17, 2026Updated 3 months ago
- [Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing☆200Apr 13, 2026Updated 3 weeks ago
- A Novel Linear Array Pushbroom (LAP) Image Restoration Method. (Accepted by AAAI 2024)☆12Jan 17, 2024Updated 2 years ago
- Lightning-fast LLM inference engine - Built with Rust (inspiration from https://github.com/GeeeekExplorer/nano-vllm)☆35Jun 24, 2025Updated 10 months ago
- SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)☆36Nov 28, 2025Updated 5 months ago
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆52Mar 25, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- super-resolution; post-training quantization; model compression☆14Nov 10, 2023Updated 2 years ago
- [ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization☆28Dec 6, 2023Updated 2 years ago
- ☆15Mar 21, 2025Updated last year
- [ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors☆25Oct 22, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- A CUDA-based implementation of the Alternating Direction Method of Multipliers (ADMM) algorithm to solve Semi-Definite Programming (SDP) …☆40Mar 12, 2026Updated last month
- [NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis☆25Nov 28, 2024Updated last year
- [NeurIPS 2024] COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing☆26Dec 8, 2024Updated last year
- ☆22Nov 26, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆41Nov 22, 2025Updated 5 months ago
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆21Mar 25, 2026Updated last month
- Implementation of Effective Sparsification of Neural Networks with Global Sparsity Constraint☆31Mar 24, 2022Updated 4 years ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- ☆59May 19, 2025Updated 11 months ago
- ☆10Mar 2, 2024Updated 2 years ago
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"☆45May 24, 2024Updated last year
- pytorch版基于gpt+nezha的中文多轮Cdial☆11Oct 22, 2022Updated 3 years ago
- Lightweight C++ logging library for tracing variables.☆16Feb 10, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Apr 24, 2026Updated last week
- ☆16Sep 27, 2023Updated 2 years ago
- ☆21Jun 3, 2023Updated 2 years ago
- Codes for skeleton extraction from point clouds. Created by Xin Li☆18Nov 22, 2021Updated 4 years ago
- [ICCV 2025] QuantCache:Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation☆17Sep 26, 2025Updated 7 months ago
- ☆10Aug 29, 2024Updated last year
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 10 months ago