β157Jun 22, 2023Updated 2 years ago
Alternatives and similar repositories for Train_Transformers_with_INT4
Users that are interested in Train_Transformers_with_INT4 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β63Jul 21, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ715Aug 13, 2024Updated last year
- IntLLaMA: A fast and light quantization solution for LLaMAβ18Jul 21, 2023Updated 2 years ago
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)β19Jul 1, 2025Updated 8 months ago
- Microsoft Automatic Mixed Precision Libraryβ635Dec 1, 2025Updated 3 months ago
- ACL 2023β39Jun 6, 2023Updated 2 years ago
- code for the paper "A Statistical Framework for Low-bitwidth Training of Deep Neural Networks"β29Oct 31, 2020Updated 5 years ago
- Low-bit optimizers for PyTorchβ138Oct 9, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Modelsβ1,629Jul 12, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ172Nov 26, 2025Updated 4 months ago
- GPTQ inference Triton kernelβ321May 18, 2023Updated 2 years ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.β892Nov 26, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-trainingβ39Jun 20, 2025Updated 9 months ago
- source code of the paper: Robust Quantization: One Model to Rule Them Allβ41Mar 24, 2023Updated 3 years ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Lβ¦β49Oct 5, 2022Updated 3 years ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"β38Aug 20, 2024Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ262Aug 9, 2025Updated 7 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Modelsβ333Nov 26, 2025Updated 4 months ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ336Jul 2, 2024Updated last year
- The official implementation of the ICML 2023 paper OFQ-ViTβ39Oct 3, 2023Updated 2 years ago
- PB-LLM: Partially Binarized Large Language Modelsβ156Nov 20, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β179Oct 3, 2024Updated last year
- For releasing code related to compression methods for transformers, accompanying our publicationsβ456Jan 16, 2025Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optiβ¦β51Oct 21, 2023Updated 2 years ago
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQβ101May 30, 2023Updated 2 years ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ676Apr 25, 2025Updated 11 months ago
- β235Jun 11, 2024Updated last year
- A selective knowledge distillation algorithm for efficient speculative decodersβ36Nov 27, 2025Updated 4 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"β396Feb 24, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisioβ¦β49Sep 27, 2024Updated last year
- β53Jul 18, 2024Updated last year
- β35Dec 22, 2025Updated 3 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".β281Nov 3, 2023Updated 2 years ago
- β171Mar 9, 2023Updated 3 years ago
- [ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentationβ15Jan 10, 2025Updated last year
- The official implementation of the EMNLP 2023 paper LLM-FP4β222Dec 15, 2023Updated 2 years ago