PygmalionAI / aphrodite-engine
Large-scale LLM inference engine
☆1,140Updated this week
Related projects ⓘ
Alternatives and complementary repositories for aphrodite-engine
- An OAI compatible exllamav2 API that's both lightweight and fast☆605Updated this week
- Web UI for ExLlamaV2☆446Updated last month
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆3,685Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)☆702Updated this week
- This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?☆732Updated 3 weeks ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆1,765Updated this week
- Customizable implementation of the self-instruct paper.☆1,024Updated 8 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆494Updated 3 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆1,647Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,760Updated last year
- Convert Compute And Books Into Instruct-Tuning Datasets! Makes: QA, RP, Classifiers.☆1,038Updated 2 weeks ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,815Updated 9 months ago
- Optimizing inference proxy for LLMs☆1,582Updated this week
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆737Updated last week
- ☆722Updated 2 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,758Updated 10 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,206Updated this week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,553Updated last month
- function calling-based LLM agents☆278Updated 2 months ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆691Updated this week
- Serving multiple LoRA finetuned LLM as one☆986Updated 6 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆526Updated this week
- ☆505Updated 3 weeks ago
- Automatically evaluate your LLMs in Google Colab☆559Updated 6 months ago
- Convenience scripts to finetune (chat-)LLaMa3 and other models for any language☆280Updated 5 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆499Updated 3 weeks ago
- A multimodal, function calling powered LLM webui.☆208Updated last month
- Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens☆333Updated 5 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆126Updated 6 months ago
- Chat language model that can use tools and interpret the results☆1,433Updated this week