IBM / vllmLinks
vLLM with support for span semantics
☆18Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Benchmarking tool for assessing LLM models' performance across different hardwares☆17Updated last year
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Updated 2 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆90Updated last week
- A collection of all available inference solutions for the LLMs☆91Updated 7 months ago
- ☆62Updated 4 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated last month
- Nexusflow function call, tool use, and agent benchmarks.☆29Updated 10 months ago
- Google TPU optimizations for transformers models☆120Updated 8 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆112Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆96Updated last week
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆57Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆52Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆41Updated this week
- Estimating hardware and cloud costs of LLMs and transformer projects☆18Updated 3 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆252Updated this week
- ScalarLM - a unified training and inference stack☆85Updated 2 weeks ago
- Train, tune, and infer Bamba model☆134Updated 4 months ago
- Granite 3.1 Language Models☆128Updated 3 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆60Updated this week
- ☆26Updated 2 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 11 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 6 months ago
- EXO Gym is an open-source Python toolkit that facilitates distributed AI research.☆78Updated last month
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆50Updated 7 months ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆52Updated last month
- Benchmark suite for LLMs from Fireworks.ai☆83Updated last week
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 7 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆119Updated last week
- AirLLM 70B inference with single 4GB GPU☆14Updated 3 months ago
- ☆55Updated 11 months ago