ACL 2023
☆39Jun 6, 2023Updated 2 years ago
Alternatives and similar repositories for Ternary_Binary_Transformer
Users that are interested in Ternary_Binary_Transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆38Aug 20, 2024Updated last year
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆114Jun 26, 2023Updated 2 years ago
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated 2 years ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆222Dec 15, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆12Apr 17, 2023Updated 2 years ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆333Nov 26, 2025Updated 4 months ago
- Codes for Accepted Paper : "MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization" in NeurIPS 2019☆54May 8, 2020Updated 5 years ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆51Oct 21, 2023Updated 2 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 4 months ago
- Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket☆68Feb 12, 2023Updated 3 years ago
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆53Mar 27, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Patch convolution to avoid large GPU memory usage of Conv2D☆95Jan 23, 2025Updated last year
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Jun 2, 2023Updated 2 years ago
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆323Mar 4, 2025Updated last year
- ☆11Apr 3, 2023Updated 2 years ago
- ProxQuant: Quantized Neural Networks via Proximal Operators☆30Feb 19, 2019Updated 7 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆25Mar 29, 2024Updated last year
- ☆88Jan 23, 2025Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"☆27Oct 24, 2019Updated 6 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Model Compression Toolbox for Large Language Models and Diffusion Models☆764Aug 14, 2025Updated 7 months ago
- ☆157Jun 22, 2023Updated 2 years ago
- Binary neural networks developed by Huawei Noah's Ark Lab☆29Feb 19, 2021Updated 5 years ago
- Data-Free Neural Architecture Search via Recursive Label Calibration. ECCV 2022.☆33Sep 13, 2022Updated 3 years ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Nov 5, 2024Updated last year
- Quantization in the Jagged Loss Landscape of Vision Transformers☆13Oct 22, 2023Updated 2 years ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆229Jan 11, 2025Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆363Nov 20, 2025Updated 4 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated 2 years ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆119Oct 15, 2025Updated 5 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆492Nov 26, 2024Updated last year
- [CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation☆11Jan 6, 2023Updated 3 years ago
- S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)☆65Aug 18, 2021Updated 4 years ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆713Aug 13, 2024Updated last year