huggingface / optimum-amd
AMD related optimizations for transformer models
☆71Updated 4 months ago
Alternatives and similar repositories for optimum-amd:
Users that are interested in optimum-amd are comparing it to the libraries listed below
- ☆112Updated this week
- Google TPU optimizations for transformers models☆104Updated 2 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- Fast and memory-efficient exact attention☆162Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 5 months ago
- ☆116Updated 11 months ago
- QuIP quantization☆52Updated last year
- Load compute kernels from the Hub☆99Updated this week
- ☆69Updated 4 months ago
- ☆62Updated last month
- 1.58-bit LLaMa model☆82Updated 11 months ago
- Easy and Efficient Quantization for Transformers☆195Updated last month
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆164Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 10 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆110Updated 3 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 5 months ago
- ☆64Updated 3 months ago
- ☆203Updated 2 months ago
- ☆66Updated 10 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆79Updated 2 weeks ago
- python package of rocm-smi-lib☆20Updated 6 months ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆120Updated last year
- Advanced Quantization Algorithm for LLMs/VLMs.☆403Updated last week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆180Updated this week
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆99Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆201Updated 4 months ago
- Use safetensors with ONNX 🤗☆50Updated 3 weeks ago