huggingface / optimum-amdLinks
AMD related optimizations for transformer models
☆81Updated last month
Alternatives and similar repositories for optimum-amd
Users that are interested in optimum-amd are comparing it to the libraries listed below
Sorting:
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated last week
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 9 months ago
- Fast and memory-efficient exact attention☆179Updated last week
- ☆215Updated 6 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆142Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆307Updated 2 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆175Updated 4 months ago
- ☆120Updated last year
- No-code CLI designed for accelerating ONNX workflows☆207Updated last month
- Easy and Efficient Quantization for Transformers☆198Updated last month
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- ☆145Updated last month
- Fast low-bit matmul kernels in Triton☆339Updated this week
- GPTQ inference Triton kernel☆303Updated 2 years ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆191Updated this week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆376Updated last year
- ☆76Updated 8 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆564Updated last week
- High-speed and easy-use LLM serving framework for local deployment☆115Updated 4 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆289Updated 2 months ago
- QuIP quantization☆55Updated last year
- ☆195Updated 3 months ago
- Load compute kernels from the Hub☆220Updated last week
- llama.cpp to PyTorch Converter☆34Updated last year
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆481Updated this week
- ☆161Updated 2 weeks ago
- Official implementation for Training LLMs with MXFP4☆55Updated 3 months ago
- Development repository for the Triton language and compiler☆127Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 9 months ago