Snowflake-Labs / vllmLinks
☆15Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated 3 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆201Updated this week
- A collection of reproducible inference engine benchmarks☆32Updated 4 months ago
- Benchmark suite for LLMs from Fireworks.ai☆80Updated 3 weeks ago
- experiments with inference on llama☆104Updated last year
- Unified storage framework for the entire machine learning lifecycle☆156Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆37Updated this week
- Cray-LM unified training and inference stack.☆22Updated 6 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆154Updated 3 months ago
- ☆31Updated 9 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆61Updated 10 months ago
- 👷 Build compute kernels☆106Updated last week
- Crispy reranking models by Mixedbread☆34Updated last month
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated last year
- 🤝 Trade any tensors over the network☆30Updated last year
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆139Updated last year
- ☆45Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆33Updated 3 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆215Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆63Updated this week
- Example ML projects that use the Determined library.☆32Updated 11 months ago
- ☆49Updated 9 months ago
- ☆118Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Truly flash T5 realization!☆70Updated last year