Apiquet / DeepLearningFrameworkFromScratchCpp
Deep Learning framework implementation with MSE, ReLU, softmax, linear layer, a feature/label generator and a mini-batch training. The main goal of this repository is to show how to develop a project in C++ by using key concepts of C++: abstract class/interface and inheritance, memory management, smart-pointers, iterator, const expression, etc.
☆17Updated 5 months ago
Related projects: ⓘ
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- 基于Eigen运算库的深度学习框架(支持CUDA加速)☆16Updated 2 years ago
- A simple deep learning framework that supports automatic differentiation and GPU acceleration.☆55Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆39Updated 11 months ago
- 大规模并行处理器编程实战 第二版答案☆26Updated 2 years ago
- Datasets, Transforms and Models specific to Computer Vision☆82Updated 10 months ago
- learning how CUDA works☆150Updated last month
- 分层解耦的深度学习推理引擎☆58Updated 3 weeks ago
- OneFlow->ONNX☆41Updated last year
- flash attention tutorial written in python, triton, cuda, cutlass☆159Updated 3 months ago
- ☆13Updated 5 months ago
- [CVPR-2023] Towards Any Structural Pruning☆17Updated last year
- CUDA 6大并行计算模式 代码与笔记☆57Updated 4 years ago
- ☆32Updated 3 months ago
- Paddle Automatically Diff Precision Toolkits.☆46Updated 5 months ago
- TensorRT encapsulation, learn, rewrite, practice.☆22Updated last year
- b站上的课程☆69Updated last year
- ☆90Updated 6 months ago
- Some common CUDA kernel implementations (Not the fastest).☆11Updated last month
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆20Updated last week
- The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )☆208Updated 4 months ago
- Step-by-step optimization of CUDA SGEMM☆207Updated 2 years ago
- ☆116Updated last year
- ☆151Updated this week
- A simple high performance CUDA GEMM implementation.☆319Updated 8 months ago
- ☆100Updated 5 months ago
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆36Updated last year
- ☆19Updated 2 years ago
- A Easy-to-understand TensorOp Matmul Tutorial☆265Updated this week
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。☆170Updated this week