Accelerating DNN Convolutional Layers with Micro-batches
☆63Apr 30, 2020Updated 5 years ago
Alternatives and similar repositories for ucudnn
Users that are interested in ucudnn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Squeeze-unet Semantic Segmentation for embedded devices☆29Apr 13, 2018Updated 8 years ago
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆81May 23, 2022Updated 3 years ago
- Haystack is an analytical cache model that given a program computes the number of cache misses.☆46Jul 15, 2019Updated 6 years ago
- Dual-way gradient sparsification approach for async DNN training, based on PyTorch.☆11Dec 8, 2022Updated 3 years ago
- A CUDA accelerated utility for using HyperLogLog's for cardinality estimation☆19Dec 26, 2012Updated 13 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Compiler toolchain to enable generation of high-level DSLs for geophysical fluid dynamics models☆29Mar 22, 2023Updated 3 years ago
- ☆15Jul 7, 2020Updated 5 years ago
- ☆12Sep 29, 2017Updated 8 years ago
- Script to check ONNX model compatibility against TensorRT versions using docker images☆12Nov 23, 2023Updated 2 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆268Jul 30, 2023Updated 2 years ago
- ONNX SEA-RAFT, optical flow☆14Jan 5, 2026Updated 3 months ago
- Absinthe is an optimization framework to fuse and tile stencil codes in one shot☆14Jul 17, 2019Updated 6 years ago
- Code for reproducing work of ICML 2019 paper: Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Appli…☆12Jun 8, 2019Updated 6 years ago
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- maskrcnn implementation using chainer☆14Jun 12, 2018Updated 7 years ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆15Mar 1, 2022Updated 4 years ago
- ☆13Oct 10, 2018Updated 7 years ago
- Torch FFI-bindings for NNPACK☆31May 26, 2017Updated 8 years ago
- Self-learning hands-on for Chainer by Jupyter notebook☆43Feb 14, 2017Updated 9 years ago
- CHIPKIT: An agile, reusable open-source framework for rapid test chip development☆42May 24, 2020Updated 5 years ago
- Assembly-optimized Marvin32 hash function☆12Jan 17, 2024Updated 2 years ago
- Nonblocking data structures☆12Jan 25, 2015Updated 11 years ago
- News in Privacy-Preserving Machine Learning☆12Feb 5, 2020Updated 6 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- DeepPerf is a set of cuda assembling developing tools☆10Dec 19, 2018Updated 7 years ago
- Chunky Loop Interaction☆25Aug 13, 2019Updated 6 years ago
- This is the open-source version of TinyTS. The code is dirty so far. We may clean the code in the future.☆21Aug 11, 2025Updated 8 months ago
- 📝 "Synthesizing Benchmarks for Predictive Modeling" (🥇 CGO'17 Best Paper)☆22Feb 10, 2023Updated 3 years ago
- An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations☆19Apr 14, 2020Updated 6 years ago
- This repository containts the pytorch scripts to train mixed-precision networks for microcontroller deployment, based on the memory contr…☆51May 9, 2024Updated last year
- Question Dependent Recurrent Entity Network☆13Sep 21, 2017Updated 8 years ago
- TP-PARSEC: A Task Parallel PARSEC Benchmark Suite☆11Nov 1, 2020Updated 5 years ago
- Directed Acyclic Graphs With Modern Fortran☆11May 25, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Minimum viable code for the Decodable Information Bottleneck paper. Pytorch Implementation.☆11Oct 20, 2020Updated 5 years ago
- Branch Predictor Optimization for BlackParrot☆15Mar 24, 2024Updated 2 years ago
- Anomaly Detection in computer vision☆21May 21, 2020Updated 5 years ago
- ☆17Sep 15, 2021Updated 4 years ago
- Data Dependence Analyzer in the Polyhedral Model☆21Nov 2, 2023Updated 2 years ago
- ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.☆27Jul 6, 2023Updated 2 years ago
- Code for the Reset-free Trial and Error learning paper (RTE) experiments☆10Jan 3, 2018Updated 8 years ago