KarhouTam / cuda-kernels
Some common CUDA kernel implementations (Not the fastest).
☆11Updated last month
Related projects: ⓘ
- TensorRT encapsulation, learn, rewrite, practice.☆22Updated last year
- EasyNN是一个面向教学而开发的神经网络推理框架,旨在让大家0基础也能自主完成推理框架编写!☆22Updated 3 weeks ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆29Updated last month
- b站上的课程☆69Updated last year
- 分层解耦的深度学习推理引擎☆58Updated 3 weeks ago
- 该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。☆59Updated 11 months ago
- ☆23Updated last year
- ☆15Updated last week
- A large number of cuda/tensorrt cases . 大量案例来学习cuda/tensorrt☆103Updated 2 years ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆40Updated 11 months ago
- async inference for machine learning model☆27Updated last year
- ☆90Updated 6 months ago
- ☆32Updated 3 months ago
- ☆56Updated last week
- CUDA 6大并行计算模式 代码与笔记☆57Updated 4 years ago
- ☆11Updated 9 months ago
- 彻底弄懂BP反向传播,15行代码,C++实现也简单,MNIST分类98.29%精度☆33Updated 2 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆20Updated last week
- OneFlow->ONNX☆41Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆40Updated last week
- Serving Inside Pytorch☆141Updated last week
- A simple neural network inference framework☆23Updated last year
- A tutorial for CUDA&PyTorch☆110Updated last week
- ☆18Updated 3 years ago
- ☆116Updated last year
- 用C++实现一个简单的Transformer模型。 Attention Is All You Need。☆37Updated 3 years ago
- This is a repository to practice multi-thread programming in C++☆15Updated 6 months ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。☆170Updated this week
- An onnx-based quantitation tool.☆69Updated 8 months ago
- TensorRT 2022 亚军方案,tensorrt加速mobilevit模型☆56Updated 2 years ago