Said-Akbar / vllm-rocm
FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs
☆23Updated 3 weeks ago
Alternatives and similar repositories for vllm-rocm:
Users that are interested in vllm-rocm are comparing it to the libraries listed below
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆17Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆231Updated this week
- Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs☆38Updated 3 months ago
- ☆40Updated last year
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆65Updated 5 months ago
- GPU Power and Performance Manager☆57Updated 5 months ago
- Lightweight Inference server for OpenVINO☆143Updated this week
- Privacy-first agentic framework with powerful reasoning & task automation capabilities. Natively distributed and fully ISO 27XXX complian…☆66Updated last week
- Simple monkeypatch to boost AMD Navi 3 GPUs☆36Updated 10 months ago
- transparent proxy server on demand model swapping for llama.cpp (or any local OpenAPI compatible server)☆482Updated last week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆148Updated 10 months ago
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆49Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆55Updated last month
- 8-bit CUDA functions for PyTorch Rocm compatible☆39Updated last year
- Open source LLM UI, compatible with all local LLM providers.☆173Updated 6 months ago
- Web UI for ExLlamaV2☆486Updated last month
- Train your own small bitnet model☆65Updated 5 months ago
- Croco.Cpp is a 3rd party testground for KoboldCPP, a simple one-file way to run various GGML/GGUF models with KoboldAI's UI. (for Croco.C…☆101Updated this week
- run DeepSeek-R1 GGUFs on KTransformers☆212Updated last month
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆37Updated 7 months ago
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆64Updated 5 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 2 months ago
- automatically quant GGUF models☆164Updated this week
- AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models☆152Updated 11 months ago
- A zero dependency web UI for any LLM backend, including KoboldCpp, OpenAI and AI Horde☆107Updated this week
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆178Updated 8 months ago
- ☆46Updated last month
- LLM inference in C/C++☆19Updated last week
- Traing PRO extension for oobabooga WebUI - recent dev version☆48Updated 2 months ago
- Fast and memory-efficient exact attention☆163Updated this week