ACL 2023
☆39Jun 6, 2023Updated 2 years ago
Alternatives and similar repositories for Ternary_Binary_Transformer
Users that are interested in Ternary_Binary_Transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆39Aug 20, 2024Updated last year
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆114Jun 26, 2023Updated 2 years ago
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated 2 years ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆224Dec 15, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆12Apr 17, 2023Updated 3 years ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆339Apr 10, 2026Updated 3 weeks ago
- Codes for Accepted Paper : "MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization" in NeurIPS 2019☆54May 8, 2020Updated 5 years ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆51Oct 21, 2023Updated 2 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆171Nov 26, 2025Updated 5 months ago
- Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket☆67Feb 12, 2023Updated 3 years ago
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆54Mar 27, 2024Updated 2 years ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Jun 2, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Apr 3, 2023Updated 3 years ago
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆324Mar 4, 2025Updated last year
- ProxQuant: Quantized Neural Networks via Proximal Operators☆30Feb 19, 2019Updated 7 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆25Mar 29, 2024Updated 2 years ago
- ☆87Jan 23, 2025Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆39Sep 24, 2024Updated last year
- Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"☆27Oct 24, 2019Updated 6 years ago
- Model Compression Toolbox for Large Language Models and Diffusion Models☆779Aug 14, 2025Updated 8 months ago
- ☆157Jun 22, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Binary neural networks developed by Huawei Noah's Ark Lab☆29Feb 19, 2021Updated 5 years ago
- Data-Free Neural Architecture Search via Recursive Label Calibration. ECCV 2022.☆33Sep 13, 2022Updated 3 years ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Nov 5, 2024Updated last year
- Quantization in the Jagged Loss Landscape of Vision Transformers☆13Oct 22, 2023Updated 2 years ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆229Jan 11, 2025Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆40Mar 11, 2024Updated 2 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆390Nov 20, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆123Oct 15, 2025Updated 6 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆506Nov 26, 2024Updated last year
- [CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation☆11Jan 6, 2023Updated 3 years ago
- S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)☆65Aug 18, 2021Updated 4 years ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆718Aug 13, 2024Updated last year
- Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"☆315Dec 23, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆338Jul 2, 2024Updated last year