opendatahub-io / vllm-tgis-adapterLinks

vLLM adapter for a TGIS-compatible gRPC server.

☆45

Alternatives and similar repositories for vllm-tgis-adapter

Users that are interested in vllm-tgis-adapter are comparing it to the libraries listed below

Sorting:

huggingface / kernel-builder
👷 Build compute kernels
☆192Updated this week
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆137Updated 6 months ago
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆194Updated this week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆140Updated this week
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated 2 weeks ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆100Updated 6 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated last year
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆37Updated 2 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
character-ai / pipelining-sft
Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings
☆101Updated 4 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆150Updated 6 months ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆124Updated 10 months ago
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆62Updated 2 months ago
vllm-project / dashboard
vLLM performance dashboard
☆38Updated last year
nexusflowai / NexusBench
Nexusflow function call, tool use, and agent benchmarks.
☆30Updated 11 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
runpod-workers / worker-sglang
SGLang is fast serving framework for large language models and vision language models.
☆30Updated 2 weeks ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆58Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆348Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
mistralai / mistral-evals
☆78Updated 2 weeks ago
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆38Updated 7 months ago
huggingface / huggingface-inference-toolkit
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆88Updated 3 weeks ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆133Updated last year
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆145Updated 10 months ago
DeepAuto-AI / hip-attention
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆148Updated last month
google-deepmind / asyncdiloco
☆47Updated last year
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
allenai / olmo-cookbook
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆57Updated this week