A high-throughput and memory-efficient inference and serving engine for LLMs
☆11Sep 4, 2025Updated 6 months ago
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Oct 11, 2023Updated 2 years ago
- ☆12Updated this week
- Inference Llama 2 in one file of pure Haskell (A port of llama2.c from Andrej Karpathy)☆14Oct 17, 2025Updated 5 months ago
- [EMNLP 2022] RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees☆11Jul 15, 2023Updated 2 years ago
- ☆15Dec 22, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- This Repo Contains Script To Fine Tune Open Source Models Using Unsloth by using UI with simple click and progress☆11Oct 3, 2024Updated last year
- Code for data reduction and analysis of Galaxy Zoo 2☆14May 20, 2016Updated 9 years ago
- arXiv-Chat: An AI research assistant and Discord bot☆13Jul 16, 2023Updated 2 years ago
- The Swift Programming Language☆12Updated this week
- Inference Llama 2 in one file of pure C☆12Nov 17, 2023Updated 2 years ago
- ☆13Oct 25, 2024Updated last year
- Port of Facebook's LLaMA model in C/C++☆13Updated this week
- Android wrapper for Inference Llama 2 in one file of pure C☆18Aug 21, 2023Updated 2 years ago
- The Swift Programming Language☆19Jan 8, 2021Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- ☆12Dec 19, 2024Updated last year
- Template repo for Python projects, especially those focusing on machine learning and/or deep learning.☆15Jan 14, 2026Updated 2 months ago
- a temporal graph analytics library based on Flink Stateful Functions☆11Jun 8, 2023Updated 2 years ago
- open-source visual bookmark manager built with next.js. simple, and easy to set up☆25Nov 23, 2024Updated last year
- Generic build server☆64May 25, 2014Updated 11 years ago
- A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models☆49Dec 24, 2025Updated 3 months ago
- Unofficial reimplementation of ViR: Vision Retention Networks by Hatamizadeh et. al. (https://arxiv.org/abs/2310.19731)☆18Jul 26, 2024Updated last year
- Model implementation and explorative UI for the paper "Towards Cost-Optimal Query Processing in the Cloud". Slides: https://bit.ly/37ZfeP…☆15Sep 17, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- Infrastructure for researching self-driving databases☆27Jul 2, 2025Updated 8 months ago
- ☆13Feb 22, 2023Updated 3 years ago
- Distributed consensus system with Map interface based on Apache Ratis☆27Nov 27, 2023Updated 2 years ago
- Docker packaging for Apache Flink Stateful Functions☆18Sep 19, 2023Updated 2 years ago
- 针对qwen微调模型进行数据预处理☆13Jan 8, 2024Updated 2 years ago
- Dockerfile examples for reproducing package cache (e.g., `/etc/apk/cache`)☆29Sep 16, 2023Updated 2 years ago
- A Benchmark Harness for Systematic and Robust Evaluation of Streaming State Stores☆17Apr 24, 2024Updated last year
- This is the repository for TimelineQA, a benchmark for querying lifelogs.☆26Jul 5, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 3 years ago
- multilabel categorical crossentropy☆15Apr 26, 2020Updated 5 years ago
- A sample app to debug and validate cellular modems on balena devices☆13Jun 5, 2019Updated 6 years ago
- Deep learning examples for the Instant Super Computer☆20Jan 28, 2026Updated 2 months ago
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆31Jun 14, 2023Updated 2 years ago
- Javascript / node.js code to read FCS flow cytometry data☆18Jun 19, 2023Updated 2 years ago
- Frontend (and soon also midleware and backend) for a new, opensource image generation platform.☆14Nov 5, 2022Updated 3 years ago