eliranwong / MultiAMDGPU_AIDev_UbuntuLinks

Multi AMD GPU Setup for AI Development on Ubuntu with ROCM

☆32

Alternatives and similar repositories for MultiAMDGPU_AIDev_Ubuntu

Users that are interested in MultiAMDGPU_AIDev_Ubuntu are comparing it to the libraries listed below

Sorting:

epolewski / EricLLM
A fast batching API to serve LLM models
☆181Updated last year
fairydreaming / llama.cpp
LLM inference in C/C++
☆21Updated 2 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆86Updated this week
leafspark / AutoGGUF
automatically quant GGUF models
☆181Updated this week
rafacelente / bllama
1.58-bit LLaMa model
☆81Updated last year
mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆92Updated 10 months ago
unslothai / llama.cpp
LLM inference in C/C++
☆77Updated 3 weeks ago
crashr / gppm
GPU Power and Performance Manager
☆59Updated 7 months ago
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆65Updated this week
KevlarKanou / rwkv7.c
Inference RWKV v7 in pure C.
☆33Updated 2 months ago
xhedit / quantkit
cli tool to quantize gguf, gptq, awq, hqq and exl2 models
☆70Updated 5 months ago
cognitivecomputations / kraken
☆66Updated last year
matatonic / openedai-vision
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
☆255Updated 3 months ago
Jaykef / mlx-rag-gguf
Minimal, clean code implementation of RAG with mlx using gguf model weights
☆50Updated last year
mgerstgrasser / tacheles
a lightweight, open-source blueprint for building powerful and scalable LLM chat applications
☆28Updated 11 months ago
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆58Updated last year
apple / ml-recurrent-drafter
☆210Updated 4 months ago
severian42 / MoA-Ollama-Chat
This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…
☆116Updated 11 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆69Updated 2 weeks ago
matt-c1 / llama-3-quant-comparison
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆153Updated last year
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆72Updated 4 months ago
bjj / exllamav2-openai-server
An OpenAI API compatible LLM inference server based on ExLlamaV2.
☆25Updated last year
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆127Updated this week
matteoserva / GraphLLM
☆203Updated 2 weeks ago
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆89Updated 3 months ago
mzbac / mlx-llm-server
For inferring and serving local LLMs using the MLX framework
☆104Updated last year
chisasaw / redcache-ai
A memory framework for Large Language Models and Agents.
☆179Updated 5 months ago
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆173Updated this week
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆147Updated last month
LeanModels / DFloat11
DFloat11: Lossless LLM Compression for Efficient GPU Inference
☆416Updated 2 weeks ago