dphnAI/aphrodite-engine

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dphnAI/aphrodite-engine)

dphnAI / aphrodite-engine

Large-scale LLM inference engine

☆1,771

Alternatives and similar repositories for aphrodite-engine

Users that are interested in aphrodite-engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

theroyallab / tabbyAPI
View on GitHub
The official API server for Exllama. OAI compatible, lightweight, and fast.
☆1,261Updated this week
turboderp-org / exllamav2
View on GitHub
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,567Mar 4, 2026Updated 3 months ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,082Updated this week
turboderp-org / exui
View on GitHub
Web UI for ExLlamaV2
☆513Feb 5, 2025Updated last year
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,190Jun 17, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
turboderp-org / exllamav3
View on GitHub
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆977Jun 21, 2026Updated last week
itsme2417 / PolyMind
View on GitHub
A multimodal, function calling powered LLM webui.
☆213Sep 23, 2024Updated last year
turboderp / exllama
View on GitHub
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,924Sep 30, 2023Updated 2 years ago
michaelfeil / infinity
View on GitHub
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
☆2,857Mar 24, 2026Updated 3 months ago
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,862Mar 21, 2026Updated 3 months ago
e-p-armstrong / augmentoolkit
View on GitHub
Create Custom LLMs
☆1,857Apr 24, 2026Updated 2 months ago
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆29,694Updated this week
jondurbin / airoboros
View on GitHub
Customizable implementation of the self-instruct paper.
☆1,052Mar 7, 2024Updated 2 years ago
Vahe1994 / AQLM
View on GitHub
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…
☆1,322Feb 26, 2026Updated 4 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
algorithmicsuperintelligence / optillm
View on GitHub
Optimizing inference proxy for LLMs
☆4,167May 7, 2026Updated last month
vllm-project / llm-compressor
View on GitHub
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆3,439Updated this week
EricLBuehler / mistral.rs
View on GitHub
Fast, flexible LLM inference
☆7,362Updated this week
Gryphe / MergeMonster
View on GitHub
An unsupervised model merging algorithm for Transformers-based language models.
☆108Apr 29, 2024Updated 2 years ago
LostRuins / koboldcpp
View on GitHub
Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
☆10,887Updated this week
casper-hansen / AutoAWQ
View on GitHub
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,349May 11, 2025Updated last year
InternLM / lmdeploy
View on GitHub
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆7,928Updated this week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆83,677Updated this week
AnswerDotAI / fsdp_qlora
View on GitHub
Training LLMs with QLoRA + FSDP
☆1,549Nov 9, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
arcee-ai / PruneMe
View on GitHub
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆267Apr 23, 2024Updated 2 years ago
predibase / lorax
View on GitHub
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆3,800May 28, 2026Updated last month
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆14,273Updated this week
AutoGPTQ / AutoGPTQ
View on GitHub
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆5,072Apr 11, 2025Updated last year
lmg-anon / mikupad
View on GitHub
LLM Frontend in a single html file
☆740Dec 27, 2025Updated 6 months ago
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,300Jun 22, 2026Updated last week
unslothai / unsloth
View on GitHub
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
☆67,571Updated this week
MeetKai / functionary
View on GitHub
Chat language model that can use tools and interpret the results
☆1,596Dec 3, 2025Updated 6 months ago
oobabooga / textgen
View on GitHub
Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.
☆47,360Jun 2, 2026Updated 3 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
zenoverflow / omnichain
View on GitHub
Efficient visual programming for AI language models
☆359May 13, 2025Updated last year
IST-DASLab / marlin
View on GitHub
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,095Sep 4, 2024Updated last year
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,456Jun 23, 2026Updated last week
theroyallab / YALS
View on GitHub
☆98Mar 28, 2026Updated 3 months ago
langroid / langroid
View on GitHub
Harness LLMs with Multi-Agent Programming
☆4,049Jun 15, 2026Updated 2 weeks ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,867Updated this week
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,683Jun 22, 2026Updated last week