neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆42Updated this week
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated 3 months ago
- ☆63Updated 5 months ago
- Cray-LM unified training and inference stack.☆22Updated 7 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆223Updated last week
- ScalarLM - a unified training and inference stack☆55Updated 3 weeks ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆78Updated 6 months ago
- GenAI Studio is a low code platform to enable users to construct, evaluate, and benchmark GenAI applications. The platform also provide c…☆48Updated last week
- ☆68Updated 3 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆223Updated this week
- ☆217Updated 7 months ago
- ☆46Updated last year
- Verbosity control for AI agents☆65Updated last year
- Route LLM requests to the best model for the task at hand.☆97Updated 2 months ago
- ☆34Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 10 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last month
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated last month
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- Self-host LLMs with vLLM and BentoML☆140Updated this week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆74Updated 5 months ago
- Utils for Unsloth https://github.com/unslothai/unsloth☆134Updated this week
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆54Updated 3 months ago
- ☆55Updated 2 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 7 months ago
- ☆238Updated this week
- Just a bunch of benchmark logs for different LLMs☆120Updated last year