huggingface / optimum-amdLinks
AMD related optimizations for transformer models
☆77Updated 7 months ago
Alternatives and similar repositories for optimum-amd
Users that are interested in optimum-amd are comparing it to the libraries listed below
Sorting:
- Fast and memory-efficient exact attention☆172Updated last week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated last week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆117Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆263Updated 7 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated 9 months ago
- ☆46Updated last week
- Fast low-bit matmul kernels in Triton☆311Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆301Updated last week
- ☆210Updated 4 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆186Updated this week
- Google TPU optimizations for transformers models☆112Updated 4 months ago
- ☆130Updated 2 months ago
- Load compute kernels from the Hub☆139Updated last week
- Inference server benchmarking tool☆67Updated last month
- Development repository for the Triton language and compiler☆122Updated this week
- No-code CLI designed for accelerating ONNX workflows☆192Updated 3 weeks ago
- Large Language Model Text Generation Inference on Habana Gaudi☆33Updated 2 months ago
- ☆119Updated last year
- python package of rocm-smi-lib☆21Updated 8 months ago
- ☆71Updated 2 months ago
- A tool to configure, launch and manage your machine learning experiments.☆153Updated this week
- ☆74Updated 6 months ago
- Python bindings for ggml☆141Updated 9 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆69Updated 2 weeks ago
- Easy and Efficient Quantization for Transformers☆198Updated 3 months ago
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Updated last year
- AI Tensor Engine for ROCm☆201Updated this week
- ☆194Updated last month
- OpenAI Triton backend for Intel® GPUs☆187Updated this week
- GPTQ inference Triton kernel☆300Updated 2 years ago