PygmalionAI / aphrodite-engine

Large-scale LLM inference engine

☆1,106

Related projects ⓘ

Alternatives and complementary repositories for aphrodite-engine

theroyallab / tabbyAPI
An OAI compatible exllamav2 API that's both lightweight and fast
☆570Updated last week
turboderp / exui
Web UI for ExLlamaV2
☆438Updated last month
e-p-armstrong / augmentoolkit
Convert Compute And Books Into Instruct-Tuning Datasets! Makes: QA, RP, Classifiers.
☆1,017Updated this week
predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆2,178Updated this week
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,811Updated 9 months ago
Maximilian-Winter / llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …
☆488Updated 3 months ago
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆698Updated last week
argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆1,612Updated this week
NousResearch / Hermes-Function-Calling
☆705Updated last month
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,018Updated 8 months ago
turboderp / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆3,634Updated last week
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆1,743Updated last month
NVIDIA / RULER
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
☆694Updated 2 weeks ago
noamgat / lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
☆1,520Updated 3 weeks ago
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆661Updated this week
kvcache-ai / ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
☆730Updated this week
FailSpy / abliterator
Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens
☆324Updated 4 months ago
itsme2417 / PolyMind
A multimodal, function calling powered LLM webui.
☆205Updated last month
runpod-workers / worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
☆242Updated last week
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆468Updated last year
MDK8888 / GPTFast
Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.
☆687Updated 2 months ago
abacaj / fine-tune-mistral
Fine-tune mistral-7B on 3090s, a100s, h100s
☆702Updated last year
Cornell-RelaxML / quip-sharp
☆501Updated last week
michaelfeil / infinity
Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models, clip, clap and colpali
☆1,430Updated this week
epolewski / EricLLM
A fast batching API to serve LLM models
☆172Updated 6 months ago
huggingface / optimum-nvidia
☆889Updated 3 weeks ago
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,745Updated 9 months ago
AI-Commandos / LLaMa2lang
Convenience scripts to finetune (chat-)LLaMa3 and other models for any language
☆276Updated 4 months ago
Leeroo-AI / mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
☆401Updated 2 months ago
apoorvumang / prompt-lookup-decoding
☆465Updated 2 months ago