neuralmagic / vllmLinks

A high-throughput and memory-efficient inference and serving engine for LLMs

☆13

Alternatives and similar repositories for vllm

Users that are interested in vllm are comparing it to the libraries listed below

Sorting:

facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆73Updated 2 weeks ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated last year
raphaelmansuy / iteration_of_tought
Example implementation of Iteration of Tought - Gives a star if you like the project
☆42Updated 6 months ago
flowaicom / flow-judge
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆73Updated 8 months ago
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆91Updated 4 months ago
bentoml / BentoLMDeploy
Self-host LLMs with LMDeploy and BentoML
☆20Updated last week
Scale3-Labs / dspy-examples
A collection of example AI programs built using DSPy and maitained by the Langtrace AI team.
☆33Updated 7 months ago
facebookresearch / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated 2 months ago
SalesforceAIResearch / perfcodegen
☆40Updated 2 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆71Updated 4 months ago
AgnostiqHQ / multi-agent-llm
Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)
☆115Updated 5 months ago
arcee-ai / DAM
☆52Updated 8 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 5 months ago
cray-lm / cray-lm
Cray-LM unified training and inference stack.
☆22Updated 5 months ago
hetailang / SqueezeAttention
☆37Updated 9 months ago
Zoeyyao27 / SirLLM
This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM
☆59Updated last year
The-Swarm-Corporation / OmniParse
Transform unstructured documents into actionable, structured data with enterprise-grade precision and reliability, ready for large-scale …
☆19Updated 2 weeks ago
axolotl-ai-cloud / axolotl-cookbook
☆34Updated 4 months ago
BBischof / yapping
Verbosity control for AI agents
☆64Updated last year
google-deepmind / asyncdiloco
☆45Updated last year
ZihanWang314 / coeCheck
☆19Updated 4 months ago
substratusai / vllm-docker
☆62Updated 3 months ago
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆63Updated 10 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆130Updated last month
bipul1010 / agents_tutorial
☆19Updated 11 months ago
HarleyCoops / StoneyNakoda
A locally trained model of Stoney Nakoda has been developed and released. You can access the working model here or train your own instanc…
☆10Updated 3 months ago
aniketmaurya / fastserve-ai
Machine Learning Serving focused on GenAI with simplicity as the top priority.
☆59Updated last week
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated last month
Cornell-RelaxML / yaqa-quantization
☆41Updated 3 weeks ago
zozoheir / tinyllm
Develop, evaluate and monitor LLM applications at scale
☆100Updated 7 months ago