huggingface / optimum-amd
AMD related optimizations for transformer models
☆63Updated 2 months ago
Alternatives and similar repositories for optimum-amd:
Users that are interested in optimum-amd are comparing it to the libraries listed below
- Fast and memory-efficient exact attention☆151Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Google TPU optimizations for transformers models☆86Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.☆153Updated 3 months ago
- The no-code AI toolchain☆80Updated this week
- ☆116Updated 8 months ago
- ☆62Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆64Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆165Updated this week
- QuIP quantization☆48Updated 10 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆257Updated 3 months ago
- ☆52Updated last month
- Advanced Quantization Algorithm for LLMs/VLMs.☆344Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆230Updated 2 months ago
- KV cache compression for high-throughput LLM inference☆103Updated last month
- Fast low-bit matmul kernels in Triton☆187Updated last week
- ☆99Updated 3 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated 4 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated last month
- ☆185Updated last month
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆219Updated last week
- ☆167Updated 3 months ago
- vLLM performance dashboard☆20Updated 8 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆75Updated this week
- Materials for learning SGLang☆166Updated last week
- ☆57Updated 7 months ago
- ☆21Updated last week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆280Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year