Snowflake-Labs / vllmLinks
☆15Updated 3 weeks ago
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- Benchmark suite for LLMs from Fireworks.ai☆83Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated 2 weeks ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆219Updated this week
- The backend behind the LLM-Perf Leaderboard☆10Updated last year
- Cray-LM unified training and inference stack.☆22Updated 8 months ago
- A collection of reproducible inference engine benchmarks☆33Updated 5 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated 11 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆267Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆67Updated last week
- ☆31Updated 10 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆90Updated last week
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- Train, tune, and infer Bamba model☆132Updated 4 months ago
- experiments with inference on llama☆104Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆41Updated this week
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆94Updated this week
- ☆98Updated last month
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆33Updated 2 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated 2 months ago
- 🤝 Trade any tensors over the network☆30Updated 2 years ago
- 👷 Build compute kernels☆149Updated this week
- Make triton easier☆47Updated last year
- ☆46Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 11 months ago
- Easy and Efficient Quantization for Transformers☆203Updated 3 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 5 months ago
- ☆48Updated last year
- Example ML projects that use the Determined library.☆32Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆87Updated 2 weeks ago