Snowflake-Labs / vllmLinks
☆15Updated 2 months ago
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- A collection of reproducible inference engine benchmarks☆31Updated 2 months ago
- ☆30Updated 7 months ago
- ☆47Updated 4 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 8 months ago
- ☆47Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆69Updated last week
- 🤝 Trade any tensors over the network☆30Updated last year
- ☆44Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebs☆56Updated last week
- vLLM adapter for a TGIS-compatible gRPC server.☆32Updated this week
- ☆47Updated 9 months ago
- ☆16Updated last year
- Cray-LM unified training and inference stack.☆22Updated 4 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆58Updated last month
- Make triton easier☆46Updated last year
- This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.☆14Updated this week
- ☆51Updated 7 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- Truly flash T5 realization!☆67Updated last year
- Creating Generative AI Apps which work☆17Updated 2 months ago
- Tutorial on how to convert machine learned models into ONNX☆16Updated 2 years ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- ☆56Updated last month
- DPO, but faster 🚀☆43Updated 6 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- Simple GRPO scripts and configurations.☆58Updated 4 months ago