xiatwhu / baidu_topk
☆11Updated 9 months ago
Related projects: ⓘ
- ☆90Updated 6 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆20Updated last week
- ☆56Updated last week
- ☆70Updated 6 months ago
- ☆133Updated 2 months ago
- ☆140Updated 4 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆119Updated 2 months ago
- A tutorial for CUDA&PyTorch☆110Updated last week
- ☆32Updated 3 months ago
- ☆123Updated 3 months ago
- flash attention tutorial written in python, triton, cuda, cutlass☆159Updated 3 months ago
- learning how CUDA works☆150Updated last month
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- ☆77Updated last year
- b站上的课程☆69Updated last year
- ☆48Updated 2 years ago
- ☆67Updated last week
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆29Updated last month
- TensorRT encapsulation, learn, rewrite, practice.☆22Updated last year
- Yinghan's Code Sample☆272Updated 2 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆40Updated last week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆266Updated last week
- A CUDA tutorial to make people learn CUDA program from 0☆177Updated 2 months ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆40Updated 11 months ago
- 分层解耦的深度学习推理引擎☆58Updated 3 weeks ago
- A simple high performance CUDA GEMM implementation.☆319Updated 8 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆46Updated last month
- ☆95Updated 2 years ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。☆170Updated this week
- Transformer related optimization, including BERT, GPT☆58Updated last year