DACUS1995 / gradflowLinks
A small, educational autograd system with deep neural networks support
☆13Updated 2 years ago
Alternatives and similar repositories for gradflow
Users that are interested in gradflow are comparing it to the libraries listed below
Sorting:
- ☆262Updated 11 months ago
- My solutions to the assignments of CMU 10-714 Deep Learning Systems 2022☆45Updated last year
- ☆23Updated last year
- ☆177Updated 2 years ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆145Updated 4 years ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆257Updated last year
- A lightweight deep learning library☆392Updated 2 months ago
- A simple deep learning framework in pure python for purpose of learning in DL☆448Updated 11 months ago
- A Numpy implementation of a Convolutional Neural Network: slow & fast (im2col/col2im).☆58Updated 2 years ago
- A collection of research papers on low-precision training methods☆60Updated 9 months ago
- PDFs and Codelabs for the Efficient Deep Learning book.☆204Updated 2 years ago
- A super light-weight deep learning library based on NumPy in PyTorch fashion.☆94Updated 4 years ago
- ☆77Updated last year
- Simple neural network implementation using CUDA technology. It is an educational implementation.☆98Updated 7 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆77Updated 5 years ago
- Collect optimizer related papers, data, repositories☆99Updated last year
- How to train a CNN to 99% accuracy on MNIST in less than a second on a laptop☆70Updated 2 years ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆162Updated 2 months ago
- cnn☆134Updated 6 years ago
- ☆56Updated 5 months ago
- The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )☆230Updated 2 months ago
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆129Updated 2 years ago
- Collection of kernels written in Triton language☆178Updated 2 weeks ago
- ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training☆199Updated 3 years ago
- CUDA Matrix Multiplication Optimization☆256Updated last year
- Implementation of FlashAttention in PyTorch☆180Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆136Updated 2 years ago
- High Performance Int8 GEMM Kernels for SM80 and later GPUs.☆19Updated 10 months ago
- summer school materials☆46Updated 2 years ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆281Updated 3 months ago