Anonymous1252022 / fp4-all-the-wayLinks
☆41Updated 7 months ago
Alternatives and similar repositories for fp4-all-the-way
Users that are interested in fp4-all-the-way are comparing it to the libraries listed below
Sorting:
- Work in progress.☆75Updated last month
- ☆113Updated last month
- ☆159Updated 6 months ago
- The evaluation framework for training-free sparse attention in LLMs☆106Updated 2 months ago
- Official implementation for Training LLMs with MXFP4☆115Updated 8 months ago
- ☆66Updated 6 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆111Updated last year
- ☆133Updated 6 months ago
- ☆37Updated last year
- ☆52Updated 7 months ago
- ☆156Updated 10 months ago
- QuIP quantization☆61Updated last year
- ☆83Updated 11 months ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Updated last year
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆148Updated last month
- ☆115Updated 7 months ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆34Updated 6 months ago
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆52Updated last month
- A framework to compare low-bit integer and float-point formats☆50Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆89Updated last year
- This repository contains code for the MicroAdam paper.☆21Updated last year
- ☆50Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆126Updated 6 months ago
- ☆15Updated 5 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Updated 10 months ago
- KV cache compression for high-throughput LLM inference☆148Updated 10 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆172Updated last year
- ☆31Updated last year
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆148Updated last month
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Updated last year