Large-scale LLM inference engine
☆1,736May 8, 2026Updated last week
Alternatives and similar repositories for aphrodite-engine
Users that are interested in aphrodite-engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,219Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,521Mar 4, 2026Updated 2 months ago
- Go ahead and axolotl questions☆11,938Updated this week
- Web UI for ExLlamaV2☆511Feb 5, 2025Updated last year
- Tools for merging pretrained large language models.☆7,083May 6, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆878Updated this week
- A multimodal, function calling powered LLM webui.☆213Sep 23, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,921Sep 30, 2023Updated 2 years ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,798Mar 24, 2026Updated last month
- Large Language Model Text Generation Inference☆10,853Mar 21, 2026Updated last month
- Create Custom LLMs☆1,843Apr 24, 2026Updated 3 weeks ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆27,836Updated this week
- Customizable implementation of the self-instruct paper.☆1,052Mar 7, 2024Updated 2 years ago
- Optimizing inference proxy for LLMs☆3,856May 7, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,320Feb 26, 2026Updated 2 months ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆3,262Updated this week
- Fast, flexible LLM inference☆7,130Apr 15, 2026Updated last month
- An unsupervised model merging algorithm for Transformers-based language models.☆108Apr 29, 2024Updated 2 years ago
- Run GGUF models easily with a KoboldAI UI. One File. Zero Install.☆10,534Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,336May 11, 2025Updated last year
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,858Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆80,418Updated this week
- Training LLMs with QLoRA + FSDP☆1,542Nov 9, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆267Apr 23, 2024Updated 2 years ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,781May 21, 2025Updated 11 months ago
- Structured Outputs☆13,846May 13, 2026Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,060Apr 11, 2025Updated last year
- LLM Frontend in a single html file☆727Dec 27, 2025Updated 4 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,217Apr 27, 2026Updated 3 weeks ago
- Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.☆64,485Updated this week
- Chat language model that can use tools and interpret the results☆1,594Dec 3, 2025Updated 5 months ago
- Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.☆47,155Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Efficient visual programming for AI language models☆361May 13, 2025Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆1,076Sep 4, 2024Updated last year
- Efficient Triton Kernels for LLM Training☆6,365Updated this week
- ☆97Mar 28, 2026Updated last month
- Harness LLMs with Multi-Agent Programming☆4,015May 6, 2026Updated 2 weeks ago
- FlashInfer: Kernel Library for LLM Serving☆5,621Updated this week
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,577May 12, 2026Updated last week