facebookresearch / any4Links
Quantize transformers to any learned arbitrary 4-bit numeric format
☆50Updated 6 months ago
Alternatives and similar repositories for any4
Users that are interested in any4 are comparing it to the libraries listed below
Sorting:
- Framework to reduce autotune overhead to zero for well known deployments.☆94Updated 4 months ago
- ☆117Updated 8 months ago
- Explore training for quantized models☆26Updated 6 months ago
- ☆84Updated last year
- A block oriented training approach for inference time optimization.☆34Updated last year
- ☆71Updated 10 months ago
- Work in progress.☆79Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated 2 years ago
- ☆117Updated 3 weeks ago
- Odysseus: Playground of LLM Sequence Parallelism☆79Updated last year
- Quantized Attention on GPU☆44Updated last year
- ☆133Updated 8 months ago
- This repository contains code for the MicroAdam paper.☆22Updated last year
- ☆40Updated last year
- ☆160Updated 2 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆141Updated 8 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆112Updated last year
- ☆115Updated last year
- extensible collectives library in triton☆93Updated 10 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- ☆52Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆27Updated 2 years ago
- Triton-based Symmetric Memory operators and examples☆80Updated 2 weeks ago
- ☆93Updated 2 months ago
- Autonomous GPU Kernel Generation via Deep Agents☆223Updated this week
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆55Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆105Updated 7 months ago
- ☆158Updated 11 months ago