RussWong / LLM-engineering
☆15Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-engineering
- learning how CUDA works☆169Updated 3 months ago
- ☆100Updated 8 months ago
- A CUDA tutorial to make people learn CUDA program from 0☆196Updated 4 months ago
- Examples of CUDA implementations by Cutlass CuTe☆101Updated last week
- ☆32Updated last month
- CPU Memory Compiler and Parallel programing☆24Updated this week
- ☆79Updated 8 months ago
- ☆138Updated 2 weeks ago
- A tutorial for CUDA&PyTorch☆118Updated 3 weeks ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆228Updated 2 weeks ago
- ☆57Updated this week
- Codes & examples for "CUDA - From Correctness to Performance"☆70Updated last month
- Xiao's CUDA Optimization Guide [Active Adding New Contents]☆237Updated 2 years ago
- b站上的课程☆70Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆123Updated last year
- ☆110Updated 2 years ago
- CUDA 6大并行计算模式 代码与笔记☆58Updated 4 years ago
- TensorRT encapsulation, learn, rewrite, practice.☆25Updated 2 years ago
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆38Updated 5 months ago
- ☆103Updated 7 months ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆50Updated 2 months ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆157Updated this week
- 分层解耦的深度学习推理引擎☆60Updated 3 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆52Updated 3 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆304Updated 2 months ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆20Updated 2 months ago
- Machine Learning Compiler Road Map☆42Updated last year
- Yinghan's Code Sample☆289Updated 2 years ago
- ☆79Updated last year
- A Easy-to-understand TensorOp Matmul Tutorial☆294Updated 2 months ago