Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
☆87Apr 8, 2026Updated this week
Alternatives and similar repositories for tiny-vllm
Users that are interested in tiny-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Lossless normalization of uppercase characters☆11Jul 3, 2023Updated 2 years ago
- ☆17Updated this week
- Porting DWM3000 C code library to work with ATMEGA328P☆18Feb 13, 2022Updated 4 years ago
- A simple DW3000 Library for the ESP32☆30Aug 25, 2025Updated 7 months ago
- Stream Claude Code's hidden output (thinking, tool calls, subagents) to a separate terminal in real-time☆114Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆18Nov 11, 2025Updated 4 months ago
- A playground to make it easy to try crazy things☆33Feb 13, 2026Updated last month
- A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.☆135Mar 31, 2026Updated last week
- Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval And Synthesis For SLMs☆56Oct 7, 2025Updated 6 months ago
- Video chat with Modal's mascots, Moe and Dal, about Modal and its documentation.☆60Updated this week
- The Transformer in PyTorch☆13Aug 7, 2024Updated last year
- ☆67Mar 2, 2026Updated last month
- [`CVPR 2024`] Official code repository for " 'Previously On ...' From Recaps to Story Summarization". https://arxiv.org/abs/2405.11487☆13Feb 21, 2025Updated last year
- ☆27Apr 7, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆90Dec 16, 2025Updated 3 months ago
- An evolutionary many-objective approach to multiview clustering using feature and relational data☆13Oct 20, 2021Updated 4 years ago
- ☆10May 15, 2018Updated 7 years ago
- [ACM MM 2022] This is the official implementation of "Temporal Sentiment Localization: Listen and Look in Untrimmed Videos"☆18Feb 14, 2025Updated last year
- ☆18Feb 20, 2024Updated 2 years ago
- Training framework for Large Behavioral Models☆27Sep 17, 2025Updated 6 months ago
- A straightforward method to reduce your LLM inference API costs and token usage.☆22May 18, 2025Updated 10 months ago
- Minimal TPU implementation with 8x8 systolic array and PyTorch integration☆57Jan 26, 2026Updated 2 months ago
- A Beginner's Guide to Monetizing Your Python AI Chatbot☆16Apr 22, 2025Updated 11 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Composition of Multimodal Language Models From Scratch☆15Aug 16, 2024Updated last year
- Implementation of 12 AI agents evaluation techniques☆39Jul 31, 2025Updated 8 months ago
- Building Andrej Kapathy's micrograd from scratch☆48May 13, 2023Updated 2 years ago
- Official code for EnvSDD (Environmental Sound Deepfake Detection)☆31Dec 13, 2025Updated 3 months ago
- From a+b to sparsemax(QK^T)V in Triton!☆29Jun 19, 2025Updated 9 months ago
- ☆84Aug 27, 2025Updated 7 months ago
- Auto-generate robust and reliable CSS, XPath and RanoreXPath selectors in your Chrome DevTools.☆16Nov 14, 2019Updated 6 years ago
- LL-HLS implementation written in Python3☆50Dec 10, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A curation of awesome portfolio website ideas for developers and designers to draw inspiration from. Raise a pull request to add more. 💜…☆17Apr 15, 2025Updated 11 months ago
- A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimizat…☆35Nov 20, 2025Updated 4 months ago
- implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs☆170Oct 25, 2025Updated 5 months ago
- Synthetic data generation for evaluating LLM symbolic and logic reasoning☆22Mar 6, 2026Updated last month
- A Transformer Model Exploiting Histology Images and Spatial Gene Expression☆22Mar 18, 2025Updated last year
- Writing FLUX in Triton☆42Sep 22, 2024Updated last year
- Just a Playwright (python) tool practice☆10Apr 3, 2025Updated last year