Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
☆30Mar 16, 2024Updated last year
Alternatives and similar repositories for torch-bnb-fp4
Users that are interested in torch-bnb-fp4 are comparing it to the libraries listed below
Sorting:
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago
- Code Repository for the NeurIPS 2022 paper: "Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights".☆18Jul 10, 2024Updated last year
- Unofficial Scalable-Softmax Is Superior for Attention☆20May 30, 2025Updated 9 months ago
- ACL 2023☆39Jun 6, 2023Updated 2 years ago
- TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS☆16Sep 26, 2017Updated 8 years ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- [Neurips 2022] “ Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropogation”, Ziyu Jiang*, Xuxi Chen*, Xueqin Huan…☆19Mar 14, 2023Updated 2 years ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆23Dec 4, 2024Updated last year
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Nov 1, 2023Updated 2 years ago
- Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)☆45Aug 19, 2021Updated 4 years ago
- The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"☆20Dec 10, 2024Updated last year
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆49Oct 5, 2022Updated 3 years ago
- ☆21Feb 11, 2022Updated 4 years ago
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆103Mar 12, 2024Updated last year
- Post-Training Quantization for Vision transformers.☆238Jul 19, 2022Updated 3 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated last year
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆37Aug 20, 2024Updated last year
- DeiT implementation for Q-ViT☆25Apr 21, 2025Updated 10 months ago
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆128Jul 13, 2024Updated last year
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆41Sep 9, 2025Updated 5 months ago
- Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"☆27Oct 24, 2019Updated 6 years ago
- Solve puzzles. Learn CUDA.☆62Dec 13, 2023Updated 2 years ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆83Sep 10, 2023Updated 2 years ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆151Mar 21, 2025Updated 11 months ago
- Knowledge Graph Generator app☆34Apr 18, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Concurrency library☆17Oct 13, 2024Updated last year
- Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)☆141Apr 1, 2023Updated 2 years ago
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆485Nov 26, 2024Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆226Aug 1, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆71Jul 5, 2025Updated 8 months ago
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆358Nov 20, 2025Updated 3 months ago
- ☆235Jun 11, 2024Updated last year
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- This is the official pytorch implementation for the paper: Towards Accurate Post-training Quantization for Diffusion Models.(CVPR24 Poste…☆38Jun 4, 2024Updated last year