Infrasys-AI / infrasys-ai.github.ioLinks

AIInfra 和 AISystem开源课程项目

☆37

Alternatives and similar repositories for infrasys-ai.github.io

Users that are interested in infrasys-ai.github.io are comparing it to the libraries listed below

Sorting:

doongz / mlc-ai
机器学习编译陈天奇
☆53Updated 3 years ago
interestingLSY / CUDA-From-Correctness-To-Performance-Code
Codes & examples for "CUDA - From Correctness to Performance"
☆121Updated last year
li199603 / sgemm_with_cuda
SGEMM optimization with cuda step by step
☆21Updated last year
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆148Updated 9 months ago
InfiniTensor / RefactorGraph
分层解耦的深度学习推理引擎
☆79Updated 11 months ago
caijixueIT / CUDA_Learning_for_Freshman
☆14Updated 3 months ago
InfiniTensor / InfiniTensor
☆288Updated last week
StudyingLover / ggml-tutorial
☆34Updated last year
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆107Updated this week
piDack / The-ans-for-Programming-Massively-Parallel-Processor
大规模并行处理器编程实战第二版答案
☆35Updated 3 years ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Updated last year
KuangjuX / cu-x
🎉My Collections of CUDA Kernels~
☆11Updated last year
YuxueYang1204 / CudaDemo
Implement custom operators in PyTorch with cuda/c++
☆76Updated 3 years ago
ByteDance-Seed / cudaLLM
☆130Updated 5 months ago
gogongxt / nano-sglang
☆117Updated last month
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆103Updated last month
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆64Updated last year
xxxxyu / FlexNN
[MobiCom 24] Efficient and Adaptive DNN inference under changeable memory budgets
☆58Updated last year
moonquest-ai / SRDA
☆30Updated 8 months ago
lzyrapx / LeetGPU
🌈 Solutions of LeetGPU
☆71Updated last week
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆136Updated 2 years ago
xlite-dev / ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
☆250Updated this week
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆253Updated last week
dsl-learn / cutile-learn
NVIDIA cuTile learn
☆158Updated 2 months ago
XiaoSongXS / HPC-Notes
Personal Notes for Learning HPC & Parallel Computation [NO LONGER ADDING NEW CONTENT]
☆77Updated 3 years ago
QianyanTech / NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆94Updated 2 years ago
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆115Updated 7 months ago
flagos-ai / FlagTree
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆211Updated this week
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆30Updated last week
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆100Updated last year