PyTorch extension enabling direct access to cuDNN-accelerated C++ convolution functions.
☆13Mar 14, 2021Updated 5 years ago
Alternatives and similar repositories for PyTorch-cuDNN-Convolution
Users that are interested in PyTorch-cuDNN-Convolution are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- A simple script to plot the Roofline model for given HW platforms and applications☆10Mar 17, 2026Updated 2 months ago
- Compression primitives for uplink compression in Federated Learning that are compatible with Secure Aggregation.☆10Jul 27, 2022Updated 3 years ago
- code for the paper "A Statistical Framework for Low-bitwidth Training of Deep Neural Networks"☆29Oct 31, 2020Updated 5 years ago
- ☆23Aug 20, 2025Updated 9 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Find, list, and inspect processes from Go (golang).☆10Feb 4, 2018Updated 8 years ago
- Multi Stopwatch for Python☆12Sep 28, 2019Updated 6 years ago
- ☆15Sep 2, 2020Updated 5 years ago
- CASLab-GPU simulator in SystemC☆11May 29, 2020Updated 5 years ago
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Dec 18, 2019Updated 6 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- A direct convolution library targeting ARM multi-core CPUs.☆12Nov 27, 2024Updated last year
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆16Sep 8, 2022Updated 3 years ago
- Code needed to reproduce results from my ICLR 2019 paper on fixed-point quantization of the backprop algorithm.☆10Jan 24, 2019Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- PyTorch code for full quantization of DNN using BCGD☆14Jul 24, 2019Updated 6 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- An open-source tool for sequence learning in NLP built on TensorFlow.☆11Dec 23, 2021Updated 4 years ago
- LLM implementation one matrix multiplication at a time☆13Aug 8, 2024Updated last year
- KimiaPath24: Dataset for retrieval and classification in digital pathology☆13Jun 4, 2017Updated 8 years ago
- My old book about programming for Symbian 9.x based smartphones in russian☆14Jul 8, 2015Updated 10 years ago
- Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"☆19Jul 11, 2024Updated last year
- 北京交通大学毕设LaTeX模板☆16Nov 7, 2014Updated 11 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- High Performance Int8 GEMM Kernels for SM80 and later GPUs.☆23Mar 11, 2025Updated last year
- Optimizing Deep Convolutional Neural Network with Ternarized Weights and High Accuracy☆16Jan 27, 2019Updated 7 years ago
- ☆21Dec 27, 2019Updated 6 years ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆39Nov 27, 2025Updated 5 months ago
- Efficient GPU kernels for mixed-precision Vision Transformers in Triton☆17Sep 18, 2025Updated 8 months ago
- 北京交通大学学位论文非官方latex模版☆11Apr 30, 2018Updated 8 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated 2 years ago
- Zero Dependency LibTorch Safetensors Loading and Storing in C++☆23Jul 12, 2024Updated last year
- ☆10Jul 14, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Synchronized Multi-GPU Batch Normalization for PyTorch based on https://github.com/tamakoji/pytorch-syncbn☆12Nov 22, 2018Updated 7 years ago
- Torch Frontend for IREE☆26Dec 21, 2023Updated 2 years ago
- Sample app to help creating zip file to be trained for Einstein Vision Object Detection☆13Jun 11, 2018Updated 7 years ago
- share and synchronize your slides☆13Jun 23, 2016Updated 9 years ago
- some dev stuff☆11Mar 5, 2015Updated 11 years ago
- Fingerprint recognition using Python☆14Dec 3, 2016Updated 9 years ago
- Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020☆20Feb 14, 2020Updated 6 years ago