PyTorch extension enabling direct access to cuDNN-accelerated C++ convolution functions.
☆13Mar 14, 2021Updated 4 years ago
Alternatives and similar repositories for PyTorch-cuDNN-Convolution
Users that are interested in PyTorch-cuDNN-Convolution are comparing it to the libraries listed below
Sorting:
- code for the paper "A Statistical Framework for Low-bitwidth Training of Deep Neural Networks"☆29Oct 31, 2020Updated 5 years ago
- A simple script to plot the Roofline model for given HW platforms and applications☆10Aug 22, 2024Updated last year
- Multi Stopwatch for Python☆12Sep 28, 2019Updated 6 years ago
- A minimal reverse proxy with flask☆11Jan 26, 2022Updated 4 years ago
- ☆10Jul 14, 2019Updated 6 years ago
- PyTorch code for full quantization of DNN using BCGD☆14Jul 24, 2019Updated 6 years ago
- Code needed to reproduce results from my ICLR 2019 paper on fixed-point quantization of the backprop algorithm.☆10Jan 24, 2019Updated 7 years ago
- CASLab-GPU simulator in SystemC☆11May 29, 2020Updated 5 years ago
- Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"☆19Jul 11, 2024Updated last year
- Sample app to help creating zip file to be trained for Einstein Vision Object Detection☆13Jun 11, 2018Updated 7 years ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆36Nov 27, 2025Updated 3 months ago
- Find, list, and inspect processes from Go (golang).☆10Feb 4, 2018Updated 8 years ago
- Synchronized Multi-GPU Batch Normalization for PyTorch based on https://github.com/tamakoji/pytorch-syncbn☆12Nov 22, 2018Updated 7 years ago
- LLM implementation one matrix multiplication at a time☆13Aug 8, 2024Updated last year
- An open-source tool for sequence learning in NLP built on TensorFlow.☆11Dec 23, 2021Updated 4 years ago
- community-maintained pip-installable binaries (wheels) for the "extended + withdeploy" edition of the Hugo static site generator with pow…☆16Updated this week
- KimiaPath24: Dataset for retrieval and classification in digital pathology☆13Jun 4, 2017Updated 8 years ago
- Efficient GPU kernels for mixed-precision Vision Transformers in Triton☆18Sep 18, 2025Updated 5 months ago
- A direct convolution library targeting ARM multi-core CPUs.☆12Nov 27, 2024Updated last year
- share and synchronize your slides☆13Jun 23, 2016Updated 9 years ago
- Iterative H-minima Based Marker-Controlled Watershed for Cell Nucleus Segmentation☆12Mar 15, 2017Updated 8 years ago
- Optimizing Deep Convolutional Neural Network with Ternarized Weights and High Accuracy☆16Jan 27, 2019Updated 7 years ago
- ☆16Nov 26, 2020Updated 5 years ago
- This repository provides code source used in the paper: A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off☆13May 30, 2019Updated 6 years ago
- MVC Web Application Framework with Tornado, Python 2 and 3☆13May 20, 2025Updated 9 months ago
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- [ICML 2021] "Double-Win Quant: Aggressively Winning Robustness of Quantized DeepNeural Networks via Random Precision Training and Inferen…☆16Feb 13, 2022Updated 4 years ago
- Zero Dependency LibTorch Safetensors Loading and Storing in C++☆23Jul 12, 2024Updated last year
- Convert C files into Verilog☆21Jan 27, 2019Updated 7 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Feature Aware Normalization - Code for "Context-based Normalization of Histological Stains using Deep Convolutional Features"☆15Nov 11, 2018Updated 7 years ago
- My Assignment for CSE 599w http://dlsys.cs.washington.edu/☆16Dec 2, 2019Updated 6 years ago
- ROS Wrapper for openpose https://github.com/CMU-Perceptual-Computing-Lab/openpose☆15Jul 11, 2017Updated 8 years ago
- (GraphRec) Attribute-aware non-linear co-embeddings of graph features, RecSys 2019☆12Aug 6, 2020Updated 5 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated 2 years ago
- Open source code of BGL NSDI 2023☆18Jul 24, 2023Updated 2 years ago
- TensorFlow implementations of recommender systems models for implicit feedback & sequential actions☆14Dec 29, 2018Updated 7 years ago
- BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet☆17Dec 5, 2018Updated 7 years ago