a fun and educational take on vLLM
☆192Jan 25, 2026Updated 2 months ago
Alternatives and similar repositories for nano-vllm
Users that are interested in nano-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Notes and code for Programming Massively Parallel Processors☆13Mar 29, 2025Updated last year
- JAX bindings for the flash-attention3 kernels☆22Jan 2, 2026Updated 3 months ago
- Using RAG to generate data for model fine-tuning.☆13Apr 16, 2025Updated 11 months ago
- An ultra-light, ultra-flexible predictive coding framework written in pure Nim. Intended for microcontrollers.☆23Feb 23, 2026Updated last month
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Jun 11, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- How to quickly serve an LLM using Fast API, Celery, and Redis☆17Aug 29, 2023Updated 2 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- fmchisel: Efficient Compression and Training Algorithms for Foundation Models☆85Oct 23, 2025Updated 5 months ago
- 该仓库主要记录 NLP 算法工程师相关的 搜索引擎 学习笔记☆13Apr 9, 2022Updated 4 years ago
- ☆12Sep 18, 2024Updated last year
- 中文文档理解多模态语言模型,支持多模态文档信息抽取,文档embedding☆12Jun 26, 2022Updated 3 years ago
- llm201n: neural networks zero to super hero. the bridge from mirograd to tinygrad!☆63Updated this week
- 3D Telecommunications project utilizing Holoportation technology to provide live volumetric capture. Used in one case to increase the re…☆21Updated this week
- ☆46Mar 31, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- NeuraChip Accelerator Simulator☆16Apr 26, 2024Updated last year
- Ludic – an LLM-RL library for the era of experience☆62Jan 9, 2026Updated 3 months ago
- Browser-based 3D perception explorer for Waymo, nuScenes, and Argoverse 2☆67Mar 21, 2026Updated 3 weeks ago
- ☆37Jan 25, 2026Updated 2 months ago
- Train an LLM to generate cracked Manim animations for mathematical concepts.☆23Mar 14, 2025Updated last year
- ☆27Feb 27, 2025Updated last year
- A simple attribution engine.☆34Feb 1, 2023Updated 3 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- ☆10May 9, 2019Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- An HBM FPGA based SpMV Accelerator☆18Aug 29, 2024Updated last year
- Agentic Virtual Lab☆19Nov 30, 2025Updated 4 months ago
- Source code of paper "Machine Learning for Load Balancing in the Linux Kernel"☆24Sep 21, 2020Updated 5 years ago
- Universal atomic embedding based on crystalTransfomer☆25Jan 11, 2025Updated last year
- A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…☆87Dec 12, 2025Updated 4 months ago
- ☆31Jul 21, 2025Updated 8 months ago
- Source code of WSiP model☆11Aug 14, 2022Updated 3 years ago
- Dotfile management with bare git☆22Mar 14, 2026Updated last month
- Rebuild YatSenOS On RISC-V 64.☆23Jan 6, 2022Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repo gives an introduction to how to make full working example to serve your model using asynchronous Celery tasks and FastAPI. 🔥 …☆30May 21, 2024Updated last year
- SpMV using CUDA☆20Mar 5, 2018Updated 8 years ago
- 清华大学2021操作系统实验代码☆16Jun 5, 2021Updated 4 years ago
- RTL generator for SpGEMM☆10Feb 2, 2021Updated 5 years ago
- An eBPF engine for capturing and processing POSIX signals.☆42May 9, 2023Updated 2 years ago
- A Vector Store written in Go - Supports hybrid retrieval over BM25, Flat, HNSW, IVF, PQ and IVFPQ Index with Quantization, Metadata Filte…☆112Oct 15, 2025Updated 6 months ago
- [TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA☆17Jul 7, 2022Updated 3 years ago