arlo-phoenix / CTranslate2-rocmLinks
Fast inference engine for Transformer models
☆54Updated last year
Alternatives and similar repositories for CTranslate2-rocm
Users that are interested in CTranslate2-rocm are comparing it to the libraries listed below
Sorting:
- ☆418Updated 8 months ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆216Updated last week
- Simple monkeypatch to boost AMD Navi 3 GPUs☆48Updated 7 months ago
- AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading☆718Updated 2 weeks ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆260Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆588Updated this week
- ☆235Updated 2 years ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,096Updated 2 weeks ago
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆106Updated 7 months ago
- Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp OFFLINE. Speak with local LLMs via llama.c…☆152Updated 4 months ago
- ROCm docker images with fixes/support for extra architectures, such as gfx803/gfx1010.☆31Updated 2 years ago
- ☆87Updated 2 weeks ago
- 8-bit CUDA functions for PyTorch☆68Updated 2 months ago
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆837Updated 10 months ago
- Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs☆61Updated last month
- build scripts for ROCm☆188Updated last year
- ☆496Updated this week
- Web UI for ExLlamaV2☆514Updated 10 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,358Updated this week
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆100Updated last month
- ☆48Updated 2 years ago
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆488Updated last week
- LLM Frontend in a single html file☆670Updated 3 weeks ago
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆1,977Updated last week
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- A complete package that provides you with all the components needed to get started of dive deeper into Machine Learning Workloads on Cons…☆42Updated last month
- DEPRECATED!☆50Updated last year
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆153Updated this week
- CUDA on AMD GPUs☆584Updated 3 months ago
- AMD APU compatible Ollama. Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.☆133Updated last week