neuralmagic / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
β224Updated this week
Alternatives and similar repositories for guidellm:
Users that are interested in guidellm are comparing it to the libraries listed below
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 7 months ago
- β237Updated last week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMβ1,103Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 5 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ253Updated 8 months ago
- experiments with inference on llamaβ104Updated 9 months ago
- An Open Source Toolkit For LLM Distillationβ540Updated 2 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β196Updated 8 months ago
- A collection of all available inference solutions for the LLMsβ81Updated 3 weeks ago
- Self-host LLMs with vLLM and BentoMLβ94Updated this week
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β207Updated 4 months ago
- β113Updated 5 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.β302Updated last month
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Modelsβ226Updated 11 months ago
- OpenAI compatible API for TensorRT LLM triton backendβ201Updated 7 months ago
- Comparison of Language Model Inference Enginesβ208Updated 3 months ago
- awesome synthetic (text) datasetsβ264Updated 4 months ago
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ149Updated this week
- Tutorial for building LLM routerβ187Updated 8 months ago
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLaβ¦β375Updated this week
- Benchmark suite for LLMs from Fireworks.aiβ69Updated last month
- β180Updated 5 months ago
- A Lightweight Library for AI Observabilityβ237Updated last month
- β173Updated last week
- A throughput-oriented high-performance serving framework for LLMsβ773Updated 6 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.β394Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β289Updated last month
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β158Updated 6 months ago
- Let's build better datasets, together!β256Updated 3 months ago
- A family of compressed models obtained via pruning and knowledge distillationβ330Updated 4 months ago