neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- ScalarLM - a unified training and inference stack☆97Updated 2 months ago
- Cray-LM unified training and inference stack.☆22Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Updated 4 months ago
- A collection of all available inference solutions for the LLMs☆94Updated 11 months ago
- ☆67Updated 10 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 4 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆287Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆113Updated this week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆49Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆391Updated this week
- Memory optimized Mixture of Experts☆73Updated 6 months ago
- ☆219Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆220Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆89Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 2 months ago
- Self-host LLMs with vLLM and BentoML☆168Updated 2 weeks ago
- ☆67Updated 8 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆261Updated last week
- Benchmark and optimize LLM inference across frameworks with ease☆161Updated 4 months ago
- Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3- DeepS…☆445Updated this week
- GenAI Studio is a low code platform to enable users to construct, evaluate, and benchmark GenAI applications. The platform also provide c…☆59Updated 3 weeks ago
- Cross-GPU KV Cache Marketplace☆23Updated 2 months ago
- EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU☆50Updated last year
- Efficient non-uniform quantization with GPTQ for GGUF☆58Updated 4 months ago
- Enemies for your LLM☆34Updated 3 weeks ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆56Updated 5 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆42Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Updated 4 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated last year
- ☆92Updated 2 weeks ago