LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, MoE expert parallelism, OpenAI-compatible serving
☆142Mar 28, 2026Updated last week
Alternatives and similar repositories for mini-infer
Users that are interested in mini-infer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 社交平台表情包收集☆69Feb 24, 2026Updated last month
- M-Cube (M³) — Multi-thinking, Multimodal, Multi-verification Patent Drafting Assistant☆119Mar 15, 2026Updated 3 weeks ago
- ☆68Mar 23, 2026Updated 2 weeks ago
- Programming Massively Parallel Processors (4th Ed.) 大规模并行处理器程序设计、学习笔记、练习题解答与 CUDA 实现☆102Jan 25, 2026Updated 2 months ago
- Muxify is a VSCode extension that allows you to visually manage tmux sessions, windows, and panes directly from the sidebar - no need to …☆63Feb 1, 2026Updated 2 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Controllable, Reproducible, Evaluable Agent Platform☆67Mar 29, 2026Updated last week
- Data and Codes for Our Paper "PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions"☆125Jan 16, 2026Updated 2 months ago
- geo-cultural-encoding☆46Jan 6, 2026Updated 3 months ago
- A music API built with Deno for searching, streaming, and exploring music data from YouTube Music, YouTube, and Last.fm.☆187Jan 19, 2026Updated 2 months ago
- 这是一个高一学生在AI辅助下编写的极速排序算法,具有自适应等功能,已经达到工业化标准☆40Jan 24, 2026Updated 2 months ago
- Terminal-first AI assistant for software engineering tasks (inspired by Claude Code v2.0.67)☆115Mar 28, 2026Updated last week
- 基于Go-Zero实现的若依服务端脚手架,提供了完整的权 限系统、多租户支持、RBAC 权限控制、菜单管理等功能,适合快速搭建企业级后台管理系统。☆186Jan 26, 2026Updated 2 months ago
- [ICLR 2026] "DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing" (Official Implementation)☆93Mar 4, 2026Updated last month
- Kakobuy Spreadsheet features 3,000+ trending products from Weidian, Taobao, and 1688, with affordable new arrivals added daily. Exp…☆68Mar 21, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 基于Springcloud的生产级在线成人教育项目。分为学生端和管理端,包含学习服务、优惠券服务、课程推荐AI Agent等。 An online adult education project based on the Spring Cloud. It has tw…☆285Feb 14, 2026Updated last month
- Edge-native web analytics on Cloudflare. High-throughput via Durable Objects, tiered D1/R2 storage for infinite retention, and privacy-fi…☆99Apr 1, 2026Updated last week
- 基于CLIProxy开发的客户端应用-霖君☆19Feb 6, 2026Updated 2 months ago
- Stop reading logs. Start watching them. MermaidTrace is a specialized logging tool that automatically generates Mermaid JS sequence diag…☆86Mar 6, 2026Updated last month
- 一个基于 Next.js App Router 的 Web3 学习 / 实验前端项目,用来练习钱包连接、链上查询、简单转账等常见场景☆76Jan 30, 2026Updated 2 months ago
- AutoR takes a research goal, runs a fixed 8-stage pipeline with Claude Code, and requires explicit human approval after every stage befor…☆92Updated this week
- (附数据集)基于 PyTorch 实现 MobileNetV2 轻量 CNN 模型,完成 ImageNet 子集 20 类图像分类任务,包含模型训练、损失曲线绘制、卷积核 / 中间层特征图可视化全流程,附训练权重文件。 (With Dataset)PyTorch impl…☆67Jan 30, 2026Updated 2 months ago
- Pluggable role definitions for AI coding agents — one command turns Claude Code / Cursor / OpenCode / Codex into a specialized profession…☆65Mar 28, 2026Updated last week
- 🧠 AI-powered Personalized Exam System: Integrating OpenPangu LLM, Knowledge Graph RAG, and BKT algorithm for adaptive question generatio…☆289Mar 30, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 因为本人蓝牙耳机比较小,也没买保护壳,导致蓝牙耳机总是莫名其妙忘记带或者遗落,所以做了一个关于这个蓝牙和耳机的定位扫描器,因为各 个厂商其实对于贵的耳机都是能有相应的APP定位的,但是本人的耳机是华为Freeclip2,但是本人手机是小米17Pro。无法找到,所以做了这个Ap…☆70Mar 11, 2026Updated 3 weeks ago
- Coze MCP and Skill Management for OpenClaw☆86Mar 11, 2026Updated 3 weeks ago
- AI-driven quality & governance MCP Server for dbt projects. Audit coverage, profile data, detect schema drift, and auto-generate document…☆91Mar 23, 2026Updated 2 weeks ago
- A cross-platform MCP Server manager for Cursor, Claude, Windsurf, Zed & TRAE. Features one-click installation, multi-client sync, and a c…☆89Mar 6, 2026Updated last month
- A Modern, Ad Free And Simple Anime Watching Site☆58Updated this week
- Put some Christmas vibes to GitHub profile.☆60Dec 26, 2025Updated 3 months ago
- Ultra-minimal AI chat UI: 30s deploy, no sign-up; OpenAI-compatible; RAG + vision + web parsing; plugins/adapters.☆59Feb 21, 2026Updated last month
- ☆63Mar 22, 2026Updated 2 weeks ago
- 基于数字人与(微调)大模型的劳动仲裁辅助平台,支持辅助生成仲裁文书与法律咨询。☆97Feb 25, 2026Updated last month
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- MCP server for YouTube — search videos, channels, playlists, and transcripts. Works with Claude, Cursor, Windsurf, and any MCP client.☆72Mar 19, 2026Updated 2 weeks ago
- [ICLR 2026] "Does FLUX Already Know How to Perform Physically Plausible Image Composition?" (Official Implementation)☆145Mar 26, 2026Updated last week
- TeraXLang - Triton Extension for LLM. As fast as FlashAttention FlashMLA, etc.☆79Mar 20, 2026Updated 2 weeks ago
- Convert LangChain tools to FastMCP tools☆71Jan 31, 2026Updated 2 months ago
- `zl-backend 是一套企业级后端基础脚手架,基于 Spring Boot 构建。该项目采用模块化设计,旨在提供一个可扩展、易维护的后端开发基础架构,适用于快速搭建企业级应用系统。 项目提供了完整的安全认证、多模块管理、扩展功能支持等特性,可帮助开发团队快速启动新项目…☆219Jan 26, 2026Updated 2 months ago
- 小可の聚集地,由 Next.js、TypeScript、MDX 和 TailwindCSS 构建。My blog built with Next.js, TypeScript, MDX, and TailwindCSS.☆215Jan 26, 2026Updated 2 months ago
- Bridging generative world models and causal reasoning for realistic yet adversarial safety-critical scenario generation.☆57Mar 29, 2026Updated last week