neuralmagic / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
β167Updated 2 weeks ago
Related projects β
Alternatives and complementary repositories for guidellm
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β134Updated 3 months ago
- experiments with inference on llamaβ105Updated 5 months ago
- An Open Source Toolkit For LLM Distillationβ359Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β181Updated 3 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ691Updated this week
- β123Updated this week
- Self-host LLMs with vLLM and BentoMLβ74Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ253Updated last month
- β94Updated 2 months ago
- awesome synthetic (text) datasetsβ243Updated 3 weeks ago
- Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.β126Updated this week
- Manage scalable open LLM inference endpoints in Slurm clustersβ238Updated 4 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β173Updated 4 months ago
- A collection of all available inference solutions for the LLMsβ73Updated 2 months ago
- β200Updated 9 months ago
- Tutorial for building LLM routerβ163Updated 4 months ago
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for tβ¦β248Updated this week
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β134Updated 2 months ago
- β130Updated this week
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.β290Updated 2 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Modelsβ196Updated 7 months ago
- Synthetic Data for LLM Fine-Tuningβ97Updated 11 months ago
- Let's build better datasets, together!β206Updated this week
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAIβ221Updated 6 months ago
- β193Updated this week
- β112Updated this week
- Late Interaction Models Training & Retrievalβ166Updated this week
- A pipeline for LLM knowledge distillationβ78Updated 3 months ago
- β133Updated 4 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ261Updated 4 months ago