eliranwong / MultiAMDGPU_AIDev_UbuntuLinks
Multi AMD GPU Setup for AI Development on Ubuntu with ROCM
☆42Updated last month
Alternatives and similar repositories for MultiAMDGPU_AIDev_Ubuntu
Users that are interested in MultiAMDGPU_AIDev_Ubuntu are comparing it to the libraries listed below
Sorting:
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆260Updated last week
- A multimodal, function calling powered LLM webui.☆217Updated last year
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆216Updated 2 weeks ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆597Updated this week
- A fast batching API to serve LLM models☆189Updated last year
- Fully-featured, beautiful web interface for vLLM - built with NextJS.☆163Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆610Updated 9 months ago
- A platform to self-host AI on easy mode☆179Updated this week
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆266Updated 9 months ago
- Web UI for ExLlamaV2☆514Updated 10 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆219Updated 3 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆344Updated 9 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆148Updated 5 months ago
- Distributed Inference for mlx LLm☆99Updated last year
- LLM inference in C/C++☆103Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆1,387Updated this week
- ☆209Updated 3 months ago
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆386Updated last week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆753Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆85Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆155Updated 3 months ago
- GPU Power and Performance Manager☆62Updated last year
- LM inference server implementation based on *.cpp.☆293Updated 3 weeks ago
- LLM Benchmark for Throughput via Ollama (Local LLMs)☆314Updated this week
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆119Updated last year
- Wraps any OpenAI API interface as Responses with MCPs support so it supports Codex. Adding any missing stateful features. Ollama and Vllm…☆138Updated last month
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆278Updated 3 months ago
- Open source LLM UI, compatible with all local LLM providers.☆176Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆99Updated 5 months ago