Apiquet / DeepLearningFrameworkFromScratchCppLinks

Deep Learning framework implementation with MSE, ReLU, softmax, linear layer, a feature/label generator and a mini-batch training. The main goal of this repository is to show how to develop a project in C++ by using key concepts of C++: abstract class/interface and inheritance, memory management, smart-pointers, iterator, const expression, etc.

☆21

Alternatives and similar repositories for DeepLearningFrameworkFromScratchCpp

Users that are interested in DeepLearningFrameworkFromScratchCpp are comparing it to the libraries listed below

Sorting:

Li-TianCheng / TinyDL
基于Eigen运算库的深度学习框架(支持CUDA加速)
☆17Updated 3 years ago
luliyucoordinate / mynet
☆20Updated 3 years ago
Syencil / Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式代码与笔记
☆61Updated 4 years ago
hova88 / CUDA-MatMul-Practice
☆16Updated last year
piDack / The-ans-for-Programming-Massively-Parallel-Processor
大规模并行处理器编程实战第二版答案
☆33Updated 3 years ago
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆18Updated 3 weeks ago
liwei-cpp / MetaNN
☆279Updated 4 years ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆70Updated 4 years ago
jundaf2 / CUDA-INT8-GEMM
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
☆31Updated last year
OpenPPL / ppl.kernel.cuda
☆37Updated 8 months ago
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆129Updated last year
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆58Updated 8 months ago
BBuf / how-to-optimize-gemm
☆97Updated 3 years ago
zjhellofss / KuiperCourse
b站上的课程
☆75Updated last year
AyakaGEMM / Hands-on-GEMM
☆137Updated last year
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆83Updated last month
LB-Yu / tinyflow
A simple deep learning framework that supports automatic differentiation and GPU acceleration.
☆58Updated 2 years ago
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆87Updated 6 months ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆63Updated 10 months ago
yester31 / Cutlass_EX
study of cutlass
☆21Updated 7 months ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆42Updated 10 months ago
njuhope / cuda_sgemm
☆113Updated last year
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆348Updated 3 years ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆133Updated this week
l1nkr / DL-Compiler-Navigation
Machine Learning Compiler Road Map
☆43Updated last year
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆197Updated 11 months ago
ThoenigAdrian / NeuralNetworksCudaTutorial
Implement Neural Networks in Cuda from Scratch
☆23Updated last year
lzyrapx / LeetGPU
Solutions of LeetGPU
☆28Updated this week
kilianhae / FlashAttention.C
Flash Attention in raw Cuda C beating PyTorch
☆23Updated last year
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆200Updated last week