zjhellofss/KuiperLLama

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zjhellofss/KuiperLLama)

zjhellofss / KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

☆539

Alternatives and similar repositories for KuiperLLama

Users that are interested in KuiperLLama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zjhellofss / KuiperInfer
View on GitHub
校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library st…
☆3,427Jun 22, 2025Updated 11 months ago
harleyszhang / lite_llama
View on GitHub
A light llama-like llm inference framework based on the triton kernel.
☆186Jan 5, 2026Updated 4 months ago
caiwanxianhust / FasterLLaMA
View on GitHub
使用 CUDA C++ 实现的 llama 模型推理框架
☆65Nov 8, 2024Updated last year
zjhellofss / triton_course
View on GitHub
☆50Mar 4, 2026Updated 2 months ago
zjhellofss / kuiperdatawhale
View on GitHub
☆323Oct 9, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,050Updated this week
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆2,998Updated this week
916241958 / yolov8-Onnxruntime
View on GitHub
通过onnxruntime实现yolov8在CPU和GPU上面部署
☆27Aug 17, 2024Updated last year
RussWong / CUDATutorial
View on GitHub
A CUDA tutorial to make people learn CUDA program from 0
☆279Jul 9, 2024Updated last year
Tongkaio / CUDA_Kernel_Samples
View on GitHub
CUDA 算子手撕与面试指南
☆972Aug 23, 2025Updated 9 months ago
kiloGrand / kuiperinfer
View on GitHub
自制基于C++的深度学习前向推理框架
☆21Jun 4, 2023Updated 2 years ago
66RING / tiny-flash-attention
View on GitHub
flash attention tutorial written in python, triton, cuda, cutlass
☆516Jan 20, 2026Updated 4 months ago
harleyszhang / llm_counts
View on GitHub
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆119Jul 11, 2025Updated 10 months ago
ifromeast / cuda_learning
View on GitHub
learning how CUDA works
☆388Mar 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nndeploy / nndeploy
View on GitHub
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
☆1,818Apr 25, 2026Updated 3 weeks ago
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,303Jul 29, 2023Updated 2 years ago
TRT2022 / trtllm-llama
View on GitHub
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆52Oct 20, 2023Updated 2 years ago
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,080May 9, 2026Updated 2 weeks ago
yuxiaoranyu / stable_diffusion_trt_triton
View on GitHub
☆20Dec 29, 2023Updated 2 years ago
hopef / llama3_chat
View on GitHub
Llama3 Streaming Chat Sample
☆22Apr 24, 2024Updated 2 years ago
tspeterkim / flash-attention-minimal
View on GitHub
Flash Attention in ~100 lines of CUDA (forward pass only)
☆1,140Dec 30, 2024Updated last year
YangLinzhuo / cuda-sgemm-optimization
View on GitHub
CUDA SGEMM optimization note
☆15Oct 31, 2023Updated 2 years ago
BBuf / how-to-learn-deep-learning-framework
View on GitHub
how to learn PyTorch and OneFlow
☆496Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xgqdut2016 / hpc_project
View on GitHub
some hpc project for learning
☆28Aug 28, 2024Updated last year
zjhellofss / kuiperbook
View on GitHub
☆15Apr 23, 2026Updated last month
PaddleJitLab / CUDATutorial
View on GitHub
A self-learning tutorail for CUDA High Performance Programing.
☆989Jan 14, 2026Updated 4 months ago
SongQiPing / KuiperInfer_rs
View on GitHub
使用 Rust 语言重新实现 https://github.com/zjhellofss/KuiperInfer 和 https://github.com/zjhellofss/kuiperdatawhale 中的深度学习推理框架。
☆17Apr 9, 2024Updated 2 years ago
mrzhuzhe / riven
View on GitHub
CPU Memory Compiler and Parallel programing
☆26Nov 18, 2024Updated last year
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,229Apr 20, 2026Updated last month
harleyszhang / llm_note
View on GitHub
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
☆881May 10, 2026Updated last week
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆996Updated this week
Tony-Tan / CUDA_Freshman
View on GitHub
☆2,742Jan 16, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ozanarmagan / clip_tokenizer_cpp
View on GitHub
☆10Jul 18, 2024Updated last year
yvonwin / qwen2.cpp
View on GitHub
qwen2 and llama3 cpp implementation
☆50Jun 7, 2024Updated last year
sesmfs / onnx_matcher
View on GitHub
Using pattern matcher in onnx model to match and replace subgraphs.
☆81Feb 7, 2024Updated 2 years ago
zjhellofss / KuiperCourse
View on GitHub
b站上的课程
☆86Aug 25, 2023Updated 2 years ago
XiaoSongXS / CUDA-Optimization-Guide
View on GitHub
Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]
☆325Nov 8, 2022Updated 3 years ago
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆81Aug 12, 2024Updated last year
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆434Jan 4, 2024Updated 2 years ago