Snowflake-Labs / vllmLinks
☆15Updated 2 months ago
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below
Sorting:
- A collection of reproducible inference engine benchmarks☆31Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆105Updated this week
- ☆30Updated last month
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 8 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 7 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Make triton easier☆47Updated 11 months ago
- ☆28Updated 4 months ago
- ☆99Updated this week
- ☆46Updated last week
- ☆29Updated 6 months ago
- 🤝 Trade any tensors over the network☆30Updated last year
- ☆21Updated 3 months ago
- Example ML projects that use the Determined library.☆32Updated 8 months ago
- ☆49Updated 6 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Train, tune, and infer Bamba model☆127Updated last month
- vLLM adapter for a TGIS-compatible gRPC server.☆30Updated this week
- FlexAttention w/ FlashAttention3 Support☆26Updated 7 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆60Updated this week
- A place to store reusable transformer components of my own creation or found on the interwebs☆56Updated 3 weeks ago
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- The backend behind the LLM-Perf Leaderboard☆10Updated last year
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆55Updated 2 weeks ago
- Using FlexAttention to compute attention with different masking patterns☆43Updated 8 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Updated 3 months ago