aredden / torch-bnb-fp4
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
☆26Updated 10 months ago
Alternatives and similar repositories for torch-bnb-fp4:
Users that are interested in torch-bnb-fp4 are comparing it to the libraries listed below
- ☆88Updated 8 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- ☆157Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 5 months ago
- ☆60Updated 3 weeks ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆228Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- QuIP quantization☆48Updated 10 months ago
- An algorithm for static activation quantization of LLMs☆113Updated 2 weeks ago
- ☆22Updated 6 months ago
- PB-LLM: Partially Binarized Large Language Models☆150Updated last year
- ☆65Updated last week
- Official implementation of the ICLR 2024 paper AffineQuant☆24Updated 10 months ago
- Low-Rank Llama Custom Training☆21Updated 10 months ago
- This repository contains the experimental PyTorch native float8 training UX☆221Updated 6 months ago
- ☆111Updated 4 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆86Updated this week
- Fast low-bit matmul kernels in Triton☆231Updated this week
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆52Updated 2 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆30Updated 4 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆73Updated this week
- ☆99Updated 5 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated 11 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆90Updated last year
- ☆27Updated 10 months ago
- Triton kernels for Flux☆19Updated last month
- ☆23Updated 3 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆84Updated 3 weeks ago