neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72…☆405Updated this week
- Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, and full end-to-end reference exampl…☆246Updated last week
- Benchmark and optimize LLM inference across frameworks with ease☆151Updated 3 months ago
- A collection of all available inference solutions for the LLMs☆93Updated 9 months ago
- GenAI Studio is a low code platform to enable users to construct, evaluate, and benchmark GenAI applications. The platform also provide c…☆55Updated 2 weeks ago
- ScalarLM - a unified training and inference stack☆93Updated last month
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆273Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 3 months ago
- Self-host LLMs with vLLM and BentoML☆163Updated last month
- Cray-LM unified training and inference stack.☆22Updated 11 months ago
- Route LLM requests to the best model for the task at hand.☆147Updated last week
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 2 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆357Updated this week
- Luth is a state-of-the-art series of fine-tuned LLMs for French☆41Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆246Updated last week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated last week
- Train, tune, and infer Bamba model☆137Updated 6 months ago
- ☆273Updated last week
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆137Updated last week
- ☆68Updated 7 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆30Updated last year
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆84Updated 9 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆40Updated last month
- ☆36Updated 4 months ago
- ☆67Updated 9 months ago
- Training setup for Langchain's Open Deep Research☆74Updated 4 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆263Updated this week