A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
☆328Jun 10, 2025Updated last year
Alternatives and similar repositories for swiftLLM
Users that are interested in swiftLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High performance Transformer implementation in C++.☆152Jan 18, 2025Updated last year
- Disaggregated serving system for Large Language Models (LLMs).☆819Apr 6, 2025Updated last year
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆97Jun 16, 2025Updated 11 months ago
- ☆134Nov 11, 2024Updated last year
- A low-latency & high-throughput serving engine for LLMs☆506Jan 8, 2026Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A throughput-oriented high-performance serving framework for LLMs☆964Mar 29, 2026Updated 2 months ago
- Efficient and easy multi-instance LLM serving☆553Mar 12, 2026Updated 2 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆487May 30, 2025Updated last year
- A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems