neuralmagic / vllmLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Cray-LM unified training and inference stack.☆22Updated 9 months ago
- A collection of all available inference solutions for the LLMs☆92Updated 8 months ago
- ScalarLM - a unified training and inference stack☆93Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆65Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆299Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated last month
- ☆77Updated last week
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆259Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated last month
- Benchmark and optimize LLM inference across frameworks with ease☆131Updated 2 months ago
- ☆267Updated this week
- Perplexity open source garden for inference technology☆182Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated last month
- ☆47Updated last year
- Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72…☆343Updated this week
- ☆218Updated 9 months ago
- GenAI Studio is a low code platform to enable users to construct, evaluate, and benchmark GenAI applications. The platform also provide c…☆53Updated 2 months ago
- ☆68Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆58Updated 3 weeks ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated last month
- ☆64Updated 7 months ago
- LM engine is a library for pretraining/finetuning LLMs☆74Updated last week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- ☆36Updated 3 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆38Updated last week
- Google TPU optimizations for transformers models☆122Updated 9 months ago
- ☆106Updated 2 weeks ago
- Simple examples using Argilla tools to build AI☆56Updated 11 months ago
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated this week