neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆15Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- A collection of all available inference solutions for the LLMs☆91Updated 7 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆278Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated last month
- GenAI Studio is a low code platform to enable users to construct, evaluate, and benchmark GenAI applications. The platform also provide c…☆50Updated last month
- ☆218Updated 8 months ago
- ScalarLM - a unified training and inference stack☆85Updated 2 weeks ago
- Benchmark and optimize LLM inference across frameworks with ease☆121Updated last month
- Cray-LM unified training and inference stack.☆22Updated 8 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆60Updated this week
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆37Updated this week
- Self-host LLMs with vLLM and BentoML☆151Updated last week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆43Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last week
- ☆68Updated 4 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 8 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆252Updated this week
- ☆64Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- ☆257Updated this week
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆58Updated 5 months ago
- Route LLM requests to the best model for the task at hand.☆109Updated 3 weeks ago
- ☆46Updated last year
- GPTQ and efficient search for GGUF☆51Updated last month
- Train, tune, and infer Bamba model☆134Updated 4 months ago
- Verbosity control for AI agents☆65Updated last year
- ☆38Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 3 weeks ago
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 7 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆77Updated 7 months ago