huggingface / optimum-amdLinks

AMD related optimizations for transformer models

☆81

Alternatives and similar repositories for optimum-amd

Users that are interested in optimum-amd are comparing it to the libraries listed below

Sorting:

EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated last week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated 11 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 9 months ago
ROCm / flash-attention
Fast and memory-efficient exact attention
☆179Updated last week
apple / ml-recurrent-drafter
☆215Updated 6 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆307Updated 2 months ago
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆175Updated 4 months ago
mlc-ai / llm-perf-bench
☆120Updated last year
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆207Updated last month
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆198Updated last month
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
Cornell-RelaxML / qtip
☆145Updated last month
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆303Updated 2 years ago
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆191Updated this week
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆376Updated last year
casper-hansen / AutoAWQ_kernels
☆76Updated 8 months ago
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…
☆564Updated last week
powerserve-project / PowerServe
High-speed and easy-use LLM serving framework for local deployment
☆115Updated 4 months ago
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆289Updated 2 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆55Updated last year
neuralmagic / AutoFP8
☆195Updated 3 months ago
huggingface / kernels
Load compute kernels from the Hub
☆220Updated last week
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆34Updated last year
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆481Updated this week
mlc-ai / relax
☆161Updated 2 weeks ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆55Updated 3 months ago
ROCm / triton
Development repository for the Triton language and compiler
☆127Updated this week
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago