A high-throughput and memory-efficient inference and serving engine for LLMs
☆55Dec 11, 2023Updated 2 years ago
Alternatives and similar repositories for vllm-release
Users that are interested in vllm-release are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for go shared libraries (for now).☆11Dec 1, 2025Updated 5 months ago
- Prompt + regex lab☆10Nov 22, 2023Updated 2 years ago
- Detecting Drift in a Diabetes Dataset using Taipy☆12May 19, 2025Updated last year
- CSS-LM: Contrastive Semi-supervised Fine-tuning of Pre-trained Language Models☆12Jul 1, 2023Updated 2 years ago
- A proxy for Google Bard LLM☆10Nov 2, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Test-Time Memory Framework: Control Hallucinations in Foundation Models☆11Nov 4, 2025Updated 6 months ago
- Code for paper "Out-of-domain detection for natural language understanding in dialog systems"☆10May 27, 2022Updated 4 years ago
- [ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression☆47Aug 7, 2025Updated 9 months ago
- ☆19Aug 23, 2025Updated 9 months ago
- Tools for formatting large language model prompts.☆13Dec 19, 2023Updated 2 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28May 19, 2026Updated last week
- Repository for the ACL'22 paper "So Different Yet So Alike! Constrained Unsupervised Text Style Transfer"☆16Jan 19, 2024Updated 2 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆25Mar 29, 2024Updated 2 years ago
- Tiny evaluation of leading LLMs on competitive programming problems☆14Apr 10, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A taxonomy for open source cryptocurrency, blockchain, and decentralized ecosystems☆28Oct 23, 2022Updated 3 years ago
- OpenSource deployment made easy☆10Jun 13, 2015Updated 10 years ago
- AI Powered Dockerfile Generator Using Llama3.1 with GROQ☆11Oct 24, 2024Updated last year
- Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon☆16May 8, 2025Updated last year
- Simulate System76 EC with area8051 emulator☆12Mar 2, 2024Updated 2 years ago
- Sentence Embedding as a Service☆15Jun 30, 2025Updated 10 months ago
- Ongoing research training transformer models at scale☆18Jul 27, 2023Updated 2 years ago
- ☆19Dec 31, 2025Updated 4 months ago
- HTML/XML aware reverse proxy☆17Feb 16, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆868Dec 8, 2023Updated 2 years ago
- Official inference library for Mistral models☆10,806Apr 20, 2026Updated last month
- ☆48Updated this week
- Autonomous Traversal and Object Detection for Rovers☆16Updated this week
- handle push notifications & other stuff☆10May 22, 2023Updated 3 years ago
- Tensor library for machine learning☆17Jul 13, 2023Updated 2 years ago
- Convert source code to LLM ready knowledge base☆33Dec 30, 2025Updated 4 months ago
- node.js Lutron RadioRa 2 control module - to control lighting, shades, etc.☆12Nov 14, 2017Updated 8 years ago
- Research Software Design by Example☆13Sep 7, 2025Updated 8 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆59May 21, 2026Updated last week
- ☆14Jan 21, 2025Updated last year
- Cloud Streams is a cloud-native tool that captures and processes cloud events in real-time, enabling efficient event-driven architectures…☆18Apr 30, 2026Updated 3 weeks ago
- Fast model deployment on AWS EC2☆14Feb 25, 2024Updated 2 years ago
- Source and documentation for development of autopilot for a surface vessel☆15Jun 3, 2015Updated 10 years ago
- Code-Langchain☆44Feb 20, 2024Updated 2 years ago
- chatGPT 'Autonomous Agent' in Node.js, written/runs in Termux. Sandboxed REPL access, Termux:API interface, chain-of-thought Question-Obs…☆16May 12, 2023Updated 3 years ago