facebookresearch / any4Links
Quantize transformers to any learned arbitrary 4-bit numeric format
☆48Updated 2 months ago
Alternatives and similar repositories for any4
Users that are interested in any4 are comparing it to the libraries listed below
Sorting:
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆84Updated 2 weeks ago
- ☆82Updated 8 months ago
- ☆98Updated last month
- ☆35Updated last year
- A block oriented training approach for inference time optimization.☆34Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆82Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- ☆99Updated 4 months ago
- Work in progress.☆74Updated 3 months ago
- This repository contains code for the MicroAdam paper.☆19Updated 9 months ago
- Make triton easier☆47Updated last year
- ☆113Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- ☆24Updated 6 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆77Updated last year
- ☆129Updated 4 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆110Updated 11 months ago
- ☆42Updated last week
- Quantized Attention on GPU☆44Updated 10 months ago
- The evaluation framework for training-free sparse attention in LLMs☆100Updated 3 months ago
- ☆72Updated 6 months ago
- extensible collectives library in triton☆88Updated 6 months ago
- ☆158Updated 2 years ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 3 months ago
- A bunch of kernels that might make stuff slower 😉☆59Updated this week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated last week
- Transformers components but in Triton☆34Updated 4 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆97Updated 3 months ago