Tutorial for building a custom CUDA function for Pytorch
☆523Jan 25, 2019Updated 7 years ago
Alternatives and similar repositories for pytorch-custom-cuda-tutorial
Users that are interested in pytorch-custom-cuda-tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- How and why you want to make your pytorch CUDA/CPP extension with a Makefile☆172Jul 3, 2019Updated 6 years ago
- an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors☆119May 26, 2025Updated 9 months ago
- CuPy fused PyTorch neural networks ops☆273Feb 15, 2018Updated 8 years ago
- C++ extensions in PyTorch☆1,186Jan 13, 2026Updated 2 months ago
- PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)☆126Sep 6, 2018Updated 7 years ago
- Examples of C extensions for PyTorch☆256Feb 12, 2023Updated 3 years ago
- ATen: A TENsor library for C++11☆717Nov 20, 2019Updated 6 years ago
- PyTorch implementation of Deformable Convolution☆411Feb 17, 2019Updated 7 years ago
- Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.☆23Aug 21, 2020Updated 5 years ago
- Pytorch Custom CUDA kernel for searchsorted☆137Oct 25, 2023Updated 2 years ago
- Synchronized Batch Normalization implementation in PyTorch.☆1,503Apr 8, 2021Updated 4 years ago
- PyTorch Extension Library of Optimized Scatter Operations☆1,730Mar 9, 2026Updated 2 weeks ago
- Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors☆2,888Mar 5, 2024Updated 2 years ago
- PyTorch layer-by-layer model profiler☆606May 23, 2021Updated 4 years ago
- PyTorch implementation of Deformable Convolution☆911Jul 21, 2021Updated 4 years ago
- A CV toolkit for my papers.☆2,048Dec 21, 2024Updated last year
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,936Updated this week
- Example repository for custom C++/CUDA operators for TorchScript☆114Aug 28, 2022Updated 3 years ago
- Submanifold sparse convolutional networks☆2,142Jan 9, 2024Updated 2 years ago
- Count the MACs / FLOPs of your PyTorch model.☆5,080Jul 8, 2024Updated last year
- In-Place Activated BatchNorm for Memory-Optimized Training of DNNs☆1,334Jul 8, 2025Updated 8 months ago
- Training Very Deep Neural Networks Without Skip-Connections☆589Jun 9, 2018Updated 7 years ago
- PyTorch implementation of spectral graph ConvNets, NeurIPS’16☆292Oct 15, 2017Updated 8 years ago
- A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.☆2,818Sep 5, 2019Updated 6 years ago
- Detectorch - detectron for PyTorch☆559Oct 30, 2018Updated 7 years ago
- Pytorch implementation of MaxPoolingLoss.☆177Jun 9, 2018Updated 7 years ago
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,438Feb 20, 2026Updated last month
- PyTorch bindings for openai-gemm☆20Feb 6, 2017Updated 9 years ago
- tensorboard for pytorch (and chainer, mxnet, numpy, ...)☆7,990Feb 5, 2026Updated last month
- Pytorch C++ Library☆365May 16, 2018Updated 7 years ago
- PyTorch code for the "Deep Neural Networks with Box Convolutions" paper☆509Jan 20, 2020Updated 6 years ago
- Model summary in PyTorch similar to `model.summary()` in Keras☆4,063Mar 2, 2024Updated 2 years ago
- Model analyzer in PyTorch☆1,501Mar 19, 2023Updated 3 years ago
- RetinaNet in PyTorch☆999Mar 17, 2019Updated 7 years ago
- Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.☆9,386Feb 16, 2023Updated 3 years ago
- CondenseNet: Light weighted CNN for mobile devices☆691Nov 11, 2019Updated 6 years ago
- Experiments with differentiable stacks and queues in PyTorch☆145Oct 7, 2019Updated 6 years ago
- Collections of self-supervised methods, based on cvpods.☆57Aug 21, 2021Updated 4 years ago
- A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…☆5,647Updated this week