OpenMLIR / LeetGPULinks

☆21

Alternatives and similar repositories for LeetGPU

Users that are interested in LeetGPU are comparing it to the libraries listed below

Sorting:

PKUFlyingPig / MIT6.5940_TinyML
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
☆51Updated 6 months ago
InfiniTensor / InfiniTensor
☆246Updated last month
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆99Updated 2 weeks ago
guanrenyang / Programming-Massively-Parallel-Processors
Solution of Programming Massively Parallel Processors
☆47Updated last year
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆52Updated 4 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆237Updated last month
JackonYang / hands-on-tvm
hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.
☆49Updated 2 years ago
AdvancedCompiler / AdvancedCompiler
先进编译实验室的个人主页
☆114Updated 3 months ago
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆58Updated 8 months ago
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆75Updated 3 months ago
SiriusNEO / Triton-Puzzles-Lite
Puzzles for learning Triton, play it with minimal environment configuration!
☆435Updated 7 months ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆156Updated 3 months ago
gty111 / gLLM
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
☆36Updated last week
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆138Updated last month
interestingLSY / CUDA-From-Correctness-To-Performance-Code
Codes & examples for "CUDA - From Correctness to Performance"
☆103Updated 9 months ago
violetDelia / LLCompiler
☆18Updated last month
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆209Updated 3 weeks ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆138Updated this week
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆128Updated 6 months ago
InternLM / Awesome-LLM-Training-System
☆41Updated 11 months ago
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆89Updated 7 months ago
zjhellofss / triton_course
☆31Updated 2 months ago
MoE-Inf / awesome-moe-inference
Curated collection of papers in MoE model inference
☆213Updated 5 months ago
l1nkr / DL-Compiler-Navigation
Machine Learning Compiler Road Map
☆43Updated last year
XiaoSong9905 / HPC-Notes
Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]
☆68Updated 2 years ago
xgqdut2016 / cuda_code
easy cuda code
☆78Updated 7 months ago
xlite-dev / ffpa-attn
⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉
☆192Updated 2 months ago
CalebDu / Awesome-Cute
☆88Updated 2 months ago
Chtholly-Boss / swizzle
A practical way of learning Swizzle
☆22Updated 5 months ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆85Updated last month