[ICLR2026] The first W4A4KV4 quantized + 50% sparse LLMs!
☆32Jan 26, 2026Updated 4 months ago
Alternatives and similar repositories for OBR
Users that are interested in OBR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV2025]Generate one 2K image on single 24GB 3090 GPU!☆87Sep 8, 2025Updated 9 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 8 months ago
- [ICML2025] LoRA fine-tune directly on the INT4 models.☆41Nov 25, 2024Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆19Jul 1, 2025Updated 11 months ago
- Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)☆110Feb 10, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglom…☆58May 31, 2026Updated 2 weeks ago
- Minute-long video generation at 24FPS.☆68Mar 28, 2026Updated 2 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆99Feb 21, 2025Updated last year
- [ICML 2026] Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computatio…☆44May 1, 2026Updated last month
- image demoireing, moire synthesis☆17Apr 25, 2024Updated 2 years ago
- A Novel Linear Array Pushbroom (LAP) Image Restoration Method. (Accepted by AAAI 2024)☆12Jan 17, 2024Updated 2 years ago
- super-resolution; post-training quantization; model compression☆14Nov 10, 2023Updated 2 years ago
- ☆15Mar 21, 2025Updated last year
- My academic homepage☆15Jan 15, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆21Mar 25, 2026Updated 2 months ago
- Implementation of Effective Sparsification of Neural Networks with Global Sparsity Constraint☆31Mar 24, 2022Updated 4 years ago
- ☆15Jan 18, 2026Updated 4 months ago
- ☆62May 19, 2025Updated last year
- Automated sum-of-squares (SOS) Prover for Algebraic Inequalities | Python-based tool with GUI & API | Generates readable sum-of-squares p…☆35Jun 8, 2026Updated last week
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization☆16Sep 13, 2022Updated 3 years ago
- ☆10Mar 2, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"☆45May 24, 2024Updated 2 years ago
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 4 years ago
- We present Global Search Optics (GSO) to automatically design compact computational imaging systems.☆14Mar 19, 2025Updated last year
- Event batch estimation from adaptive global decay process☆13May 29, 2023Updated 3 years ago
- Lightweight C++ logging library for tracing variables.☆16Feb 10, 2026Updated 4 months ago
- ☆16Sep 27, 2023Updated 2 years ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆181Apr 24, 2026Updated last month
- [NeurIPS2024] Overcome hallucination of diffusion restoration models.☆66Apr 14, 2025Updated last year
- 数据库内核笔记☆14Aug 18, 2022Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A implement of run-length encoding for Pytorch tensor using CUDA☆14Apr 7, 2021Updated 5 years ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆47Jun 11, 2025Updated last year
- ☆10Aug 29, 2024Updated last year
- Official implementation, datasets and trained models of "SegNeuron: 3D Neuron Instance Segmentation in Any EM Volume with a Generalist Mo…☆23Jun 1, 2026Updated 2 weeks ago
- Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy☆58Updated this week
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆42Mar 13, 2023Updated 3 years ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆38Apr 4, 2024Updated 2 years ago