neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- GenAI Studio is a low code platform to enable users to construct, evaluate, and benchmark GenAI applications. The platform also provide c…☆54Updated last week
- ScalarLM - a unified training and inference stack☆94Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 2 months ago
- A collection of all available inference solutions for the LLMs☆93Updated 9 months ago
- Benchmark and optimize LLM inference across frameworks with ease☆141Updated 2 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 2 months ago
- Route LLM requests to the best model for the task at hand.☆143Updated this week
- Cray-LM unified training and inference stack.☆22Updated 10 months ago
- ☆153Updated this week
- Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72…☆388Updated this week
- Self-host LLMs with vLLM and BentoML☆161Updated 2 weeks ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆327Updated last week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated this week
- Perplexity open source garden for inference technology☆287Updated 2 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- ☆66Updated 8 months ago
- ☆68Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆265Updated this week
- Simple examples using Argilla tools to build AI☆56Updated last year
- Efficient non-uniform quantization with GPTQ for GGUF☆53Updated 2 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆39Updated last week
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated 11 months ago
- Train, tune, and infer Bamba model☆137Updated 6 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆225Updated last week
- vLLM adapter for a TGIS-compatible gRPC server.☆45Updated this week
- Fine-tune an LLM to perform batch inference and online serving.☆114Updated 6 months ago
- ☆219Updated 10 months ago
- ☆47Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆140Updated this week