neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- ScalarLM - a unified training and inference stack☆95Updated 2 months ago
- Benchmark and optimize LLM inference across frameworks with ease☆155Updated 4 months ago
- A collection of all available inference solutions for the LLMs☆94Updated 10 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆277Updated this week
- Cray-LM unified training and inference stack.☆22Updated 11 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆375Updated this week
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- Self-host LLMs with vLLM and BentoML☆163Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 3 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Updated 4 months ago
- Route LLM requests to the best model for the task at hand.☆166Updated this week
- ☆218Updated 11 months ago
- AI-Driven Research Systems (ADRS)☆117Updated last month
- ☆68Updated 7 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆190Updated last week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated 2 weeks ago
- Perplexity open source garden for inference technology☆332Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last month
- Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72…☆419Updated this week
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- ☆67Updated 9 months ago
- ☆47Updated 8 months ago
- Simple examples using Argilla tools to build AI☆57Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆256Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆110Updated last week
- ☆274Updated this week
- LLM Serving Performance Evaluation Harness☆82Updated 10 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 3 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆47Updated this week