opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆25Updated this week
Alternatives and similar repositories for vllm-tgis-adapter:
Users that are interested in vllm-tgis-adapter are comparing it to the libraries listed below
- Train, tune, and infer Bamba model☆87Updated 2 months ago
- Load compute kernels from the Hub☆107Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆70Updated last month
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last month
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 months ago
- Make triton easier☆47Updated 9 months ago
- ☆24Updated 6 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆66Updated this week
- SGLang is fast serving framework for large language models and vision language models.☆20Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆111Updated 3 months ago
- Google TPU optimizations for transformers models☆104Updated 2 months ago
- ☆14Updated last month
- Utils for Unsloth☆63Updated last week
- The driver for LMCache core to run in vLLM☆36Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- DPO, but faster 🚀☆40Updated 3 months ago
- ☆176Updated this week
- Rust crate for some audio utilities☆22Updated 3 weeks ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 10 months ago
- implement llava using candle☆14Updated 9 months ago
- ☆32Updated 9 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 11 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- 👷 Build compute kernels☆24Updated this week
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆49Updated 9 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 8 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆54Updated this week
- ☆22Updated this week