zjhellofss/triton_course

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zjhellofss/triton_course)

zjhellofss / triton_course

☆52

Alternatives and similar repositories for triton_course

Users that are interested in triton_course are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

harleyszhang / lite_llama
View on GitHub
A light llama-like llm inference framework based on the triton kernel.
☆188Jan 5, 2026Updated 6 months ago
harleyszhang / llm_counts
View on GitHub
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆119Jul 11, 2025Updated last year
zjhellofss / KuiperLLama
View on GitHub
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
☆555Oct 28, 2025Updated 9 months ago
yhwang-hub / OrinMLLM
View on GitHub
This project is primarily used to deploy large language models and multimodal large models on Orin.🚀🚀🚀
☆18Jun 23, 2026Updated last month
RussWong / CUDATutorial
View on GitHub
A CUDA tutorial to make people learn CUDA program from 0
☆279Jul 9, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NVIDIA-AI-IOT / deepstream_triton_migration
View on GitHub
Triton Migration Guide for DeepStreamSDK.
☆15Dec 19, 2023Updated 2 years ago
YangLinzhuo / cuda-sgemm-optimization
View on GitHub
CUDA SGEMM optimization note
☆15Oct 31, 2023Updated 2 years ago
zjhellofss / KuiperInfer
View on GitHub
校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library st…
☆3,476Jun 22, 2025Updated last year
hengshan / Cuda-Tutorials
View on GitHub
CUDA 13.1 Tutorial Series for RTX 5090 (Blackwell) - Chinese teaching materials
☆29Jan 18, 2026Updated 6 months ago
luliyucoordinate / cute-flash-attention
View on GitHub
Implement Flash Attention using Cute.
☆111Dec 17, 2024Updated last year
BBuf / KDA-Pilot
View on GitHub
☆234Updated this week
xgqdut2016 / hpc_project
View on GitHub
some hpc project for learning
☆28Aug 28, 2024Updated last year
gxinlong / cuda-optimization-skill
View on GitHub
A skill for automatically optimizing CUDA code.
☆42Mar 26, 2026Updated 4 months ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
GuangtaoLyu / PSSTRNet
View on GitHub
☆13Jul 28, 2024Updated 2 years ago
wangzyon / trt_learn
View on GitHub
TensorRT encapsulation, learn, rewrite, practice.
☆31Oct 19, 2022Updated 3 years ago
Manojbhat09 / nanoVLA
View on GitHub
minimal Vision Language Action framework for robot control systems
☆17Sep 15, 2025Updated 10 months ago
yl-jiang / YOLOSeries
View on GitHub
YOLO Series
☆14Oct 20, 2023Updated 2 years ago
ZonePG / cs-notes
View on GitHub
my cs notes
☆72Oct 14, 2024Updated last year
zjhellofss / kuiperbook
View on GitHub
☆17Apr 23, 2026Updated 3 months ago
caiwanxianhust / FasterLLaMA
View on GitHub
使用 CUDA C++ 实现的 llama 模型推理框架
☆65Nov 8, 2024Updated last year
wzx99 / TMIM
View on GitHub
☆13Oct 17, 2024Updated last year
l-sf / Nanodet_openvino_quant_deploy
View on GitHub
本仓库在OpenVINO推理框架下部署Nanodet检测算法，并重写预处理和后处理部分，具有超高性能！让你在Intel CPU平台上的检测速度起飞！并基于NNCF和PPQ工具将模型量化(PTQ)至int8精度，推理速度更快！
☆16Jun 14, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,335Jul 29, 2023Updated 3 years ago
lcy0604 / CTRNet-plus
View on GitHub
The official implement of CTRNet++.
☆15Dec 30, 2024Updated last year
wangdh15 / cs149
View on GitHub
☆12May 19, 2022Updated 4 years ago
bugph0bia / DeZeroCpp
View on GitHub
ゼロから作るDeep Learning ❸ をC++で実装する。自習用リポジトリ。
☆18Aug 12, 2020Updated 5 years ago
aj-talaei / GPU_Programming_Specialization
View on GitHub
This repository contains my coursework and projects completed during the GPU Programming Specialization offered by Johns Hopkins Universi…
☆11Jun 13, 2023Updated 3 years ago
xgqdut2016 / hpc2torch
View on GitHub
☆40Jun 25, 2026Updated last month
OpenPPL / ppl.llm.kernel.cuda
View on GitHub
☆150Jan 9, 2025Updated last year
peijunallin / alphalora
View on GitHub
☆19Nov 10, 2024Updated last year
mrzhuzhe / riven
View on GitHub
CPU Memory Compiler and Parallel programing
☆26Nov 18, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
leon0514 / trt-sahi-yolo
View on GitHub
Accelerating SAHI-based inference on YOLO models using TensorRT.
☆103Jan 6, 2026Updated 6 months ago
toyaix / tritonllm
View on GitHub
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
☆119Apr 28, 2026Updated 3 months ago
BaofengZan / hard_decode_trt-windows
View on GitHub
https://github.com/shouxieai/hard_decode_trt windows编译版本
☆13Sep 8, 2022Updated 3 years ago
melonedo / algebraic-layouts
View on GitHub
☆23Aug 20, 2025Updated 11 months ago
Dancingmader / 3D-High-quality-Garment-Dataset
View on GitHub
☆15Oct 9, 2022Updated 3 years ago
star-hengxing / cs149-xmake
View on GitHub
CS149 xmake version
☆46Nov 30, 2023Updated 2 years ago
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆558Sep 8, 2024Updated last year