facebookresearch / Ternary_Binary_TransformerView external linksLinks
ACL 2023
☆39Jun 6, 2023Updated 2 years ago
Alternatives and similar repositories for Ternary_Binary_Transformer
Users that are interested in Ternary_Binary_Transformer are comparing it to the libraries listed below
Sorting:
- The official implementation of the ICML 2023 paper OFQ-ViT☆37Oct 3, 2023Updated 2 years ago
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Mar 4, 2024Updated last year
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆114Jun 26, 2023Updated 2 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- Solution of Kaggle competition: Feedback Prize - Evaluating Student Writing☆16Mar 30, 2022Updated 3 years ago
- Codes for Accepted Paper : "MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization" in NeurIPS 2019☆54May 8, 2020Updated 5 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆23Mar 29, 2024Updated last year
- The official implementation of the EMNLP 2023 paper LLM-FP4☆220Dec 15, 2023Updated 2 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- [CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation☆11Jan 6, 2023Updated 3 years ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Nov 5, 2024Updated last year
- ☆85Jan 23, 2025Updated last year
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆327Nov 26, 2025Updated 2 months ago
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 2 months ago
- ☆157Jun 22, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Aug 25, 2023Updated 2 years ago
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆12Apr 17, 2023Updated 2 years ago
- Quantization in the Jagged Loss Landscape of Vision Transformers☆13Oct 22, 2023Updated 2 years ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆50Oct 21, 2023Updated 2 years ago
- Unofficial Scalable-Softmax Is Superior for Attention☆20May 30, 2025Updated 8 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆95Jan 23, 2025Updated last year
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆89Jun 2, 2023Updated 2 years ago
- Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"☆27Oct 24, 2019Updated 6 years ago
- Binary neural networks developed by Huawei Noah's Ark Lab☆29Feb 19, 2021Updated 4 years ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆31Nov 27, 2025Updated 2 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆482Nov 26, 2024Updated last year
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆68Mar 7, 2024Updated last year
- [ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization☆93May 5, 2022Updated 3 years ago
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated last year
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆322Mar 4, 2025Updated 11 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆37Sep 24, 2024Updated last year
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆372Feb 14, 2025Updated 11 months ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆713Aug 13, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- ProxQuant: Quantized Neural Networks via Proximal Operators☆30Feb 19, 2019Updated 6 years ago
- PB-LLM: Partially Binarized Large Language Models☆156Nov 20, 2023Updated 2 years ago