Snowflake-Labs / vllm
☆14Updated last month
Alternatives and similar repositories for vllm:
Users that are interested in vllm are comparing it to the libraries listed below
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆51Updated last week
- ☆27Updated 4 months ago
- Cray-LM unified training and inference stack.☆21Updated last month
- Train, tune, and infer Bamba model☆86Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 6 months ago
- Make triton easier☆47Updated 9 months ago
- Benchmark suite for LLMs from Fireworks.ai☆70Updated last month
- Dolomite Engine is a library for pretraining/finetuning LLMs☆44Updated this week
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- Example ML projects that use the Determined library.☆29Updated 6 months ago
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆49Updated 9 months ago
- Sentence Embedding as a Service☆15Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- A place to store reusable transformer components of my own creation or found on the interwebs☆48Updated this week
- Utilities for Training Very Large Models☆58Updated 6 months ago
- ☆47Updated 6 months ago
- 🤝 Trade any tensors over the network☆30Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆59Updated 5 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆61Updated 7 months ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated last month
- ☆21Updated 3 weeks ago
- ☆43Updated last year
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last month
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 10 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 months ago
- Tools for merging pretrained large language models.☆19Updated 9 months ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago