godweiyang / NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
☆1,305Updated 3 years ago
Related projects: ⓘ
- how to optimize some algorithm in cuda.☆1,443Updated this week
- ☆972Updated 6 months ago
- A quickstart and benchmark for pytorch distributed training.☆1,617Updated last month
- 🎉CUDA/C++ 笔记 / 技术博客: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、his…☆1,140Updated this week
- Simple samples for TensorRT programming☆1,477Updated 2 weeks ago
- PyTorch Project Specification.☆660Updated 3 years ago
- Some tricks of pytorch...☆1,152Updated 2 months ago
- Sample codes for my CUDA programming book☆1,524Updated last year
- ☆2,104Updated 8 months ago
- real Transformer TeraFLOPS on various GPUs☆859Updated 8 months ago
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,511Updated 5 months ago
- pytorch memory track code☆992Updated 3 years ago
- 整理 pytorch 单机多 GPU 训练方法与原理☆741Updated 2 years ago
- ☆584Updated 3 months ago
- Parallel programming tutorials☆599Updated 3 years ago
- compiler learning resources collect.☆2,049Updated 3 months ago
- This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…☆803Updated last year
- This is a Chinese translation of the CUDA programming guide☆1,176Updated last year
- Model Quantization Benchmark☆752Updated 3 months ago
- fitlog是一款在深度学习训练中用于辅助用户记录日志和管理代码的工具☆1,463Updated 8 months ago
- An easy/swift-to-adapt PyTorch-Lighting template. 套壳模板,简单易用,稍改原来Pytorch代码,即可适配Lightning。You can translate your previous Pytorch code much…☆1,300Updated last year
- micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantiz…☆2,212Updated 2 years ago
- A simple deep learning framework in pure python for purpose of learning in DL☆421Updated last year
- row-major matmul optimization☆584Updated last year
- how to learn PyTorch and OneFlow☆327Updated 5 months ago
- A lightweight deep learning library☆369Updated 11 months ago
- A simple network quantization demo using pytorch from scratch.☆497Updated last year
- Sublinear memory optimization for deep learning. https://arxiv.org/abs/1604.06174☆587Updated 4 years ago
- A primitive library for neural network☆1,268Updated last month
- The road to hack SysML and become an system expert☆424Updated 2 weeks ago