注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能
☆179Aug 11, 2025Updated 8 months ago
Alternatives and similar repositories for nano_vllm_note
Users that are interested in nano_vllm_note are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- some hpc project for learning☆26Aug 28, 2024Updated last year
- ☆20Apr 11, 2026Updated last week
- 该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题与我写的答案☆27Oct 14, 2023Updated 2 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 8 months ago
- ☆13Jan 7, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆59Feb 6, 2026Updated 2 months ago
- Implementation and optimization of matrix multiplication on single CPU (HPC-THU-2023-Autumn)☆18Feb 27, 2024Updated 2 years ago
- A light llama-like llm inference framework based on the triton kernel.☆180Jan 5, 2026Updated 3 months ago
- ☆22Aug 20, 2025Updated 7 months ago
- An onnx-based quantitation tool.☆71Jan 8, 2024Updated 2 years ago
- ToyLLM: Learning LLM from Scratch☆25Updated this week
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 4 months ago
- Implement Flash Attention using Cute.☆105Dec 17, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- patches for huggingface transformers to save memory☆36Jun 2, 2025Updated 10 months ago
- Optimize softmax in triton in many cases☆23Sep 6, 2024Updated last year
- 本仓库在OpenVINO推理框架下部署Nanodet检测算法,并重写预处理和后处理部分,具有超高性能!让你在Intel CPU平台上的检测速度起飞! 并基于NNCF和PPQ工具将模型量化(PTQ)至int8精度,推理速度更快!☆16Jun 14, 2023Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- ☆32Jul 2, 2025Updated 9 months ago
- A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.☆3,967Mar 13, 2026Updated last month
- Easy to use hybrid index for semantic + keyword search.☆17Jul 19, 2023Updated 2 years ago
- 晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。☆16Dec 15, 2024Updated last year
- 算子库☆17Jul 9, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 基于天池新闻推荐赛数据集实现的新闻推荐☆47Dec 17, 2024Updated last year
- 基于 NVIDIA NeMo 的本地化 SRT 字幕生成器 | Local SRT Generator based on NVIDIA NeMo 🔹 支持 Parakeet多模型切换 🔹 视频音频一键转录 🔹 Gradio GUI 🔹 支持 GPU 加速 (CUDA…☆25Feb 4, 2026Updated 2 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆98Dec 2, 2025Updated 4 months ago
- AiTer Optimized Model☆61Updated this week
- A Triton-only attention backend for vLLM☆25Mar 17, 2026Updated last month
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 10 months ago
- Nano vLLM☆12,816Nov 3, 2025Updated 5 months ago
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- ☆98May 31, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- HPC Game Platform☆11Apr 20, 2023Updated 2 years ago
- ☆119May 16, 2025Updated 11 months ago
- A Toolkit for Fine-Tuning Large Language Models with LoRA and DeepSpeed☆11Apr 14, 2023Updated 3 years ago
- flash attention 优化日志☆28Jun 4, 2025Updated 10 months ago
- ☆16Jan 14, 2025Updated last year
- Minimalist vLLM implementation in Rust☆166Updated this week
- ☆11May 2, 2023Updated 2 years ago