aredden / torch-bnb-fp4
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
☆27Updated last year
Alternatives and similar repositories for torch-bnb-fp4:
Users that are interested in torch-bnb-fp4 are comparing it to the libraries listed below
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- ☆157Updated last year
- QuIP quantization☆52Updated last year
- ☆116Updated last month
- Load compute kernels from the Hub☆99Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- ☆95Updated 9 months ago
- ☆65Updated 2 months ago
- ☆23Updated 8 months ago
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆121Updated last month
- A library for unit scaling in PyTorch☆124Updated 3 months ago
- Official implementation of the ICLR 2024 paper AffineQuant☆25Updated 11 months ago
- Low-bit optimizers for PyTorch☆125Updated last year
- Work in progress.☆50Updated last week
- Code for studying the super weight in LLM☆94Updated 3 months ago
- [WIP] Better (FP8) attention for Hopper☆26Updated last month
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆37Updated 3 weeks ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆33Updated 6 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆74Updated 4 months ago
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- ☆101Updated 6 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆84Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆47Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆234Updated last month
- Low-Rank Llama Custom Training☆22Updated 11 months ago