Optimized Parallel Tiled Approach to perform 2D Convolution by taking advantage of the lower latency, higher bandwidth shared memory as well as global constant memory cached aggresively within GPU thread blocks.
☆15Oct 17, 2017Updated 8 years ago
Alternatives and similar repositories for cuda-tiled-2D-convolution
Users that are interested in cuda-tiled-2D-convolution are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a simple 2d convolution written in cuda c which uses shared memory for better performance☆20Apr 12, 2018Updated 8 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- An automatic test pattern generation (ATPG) and fault simulation system.☆12Sep 9, 2019Updated 6 years ago
- Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memo…☆16Sep 24, 2017Updated 8 years ago
- C program for Drawwing Complex graphics with L-edit☆10Jan 7, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Static-sized long-precision arithmetic library for use inside GPU parallelization with CUDA☆11Apr 5, 2025Updated last year
- ☆11Feb 3, 2026Updated 2 months ago
- Matlab mex wrappers to cuSPARSE (NVIDIA)☆11Dec 10, 2025Updated 4 months ago
- 2D and 3D Matrix Convolution and Matrix Multiplication with CUDA☆10Jun 14, 2021Updated 4 years ago
- ☆15Mar 15, 2022Updated 4 years ago
- Zephyr driver for PCF85063A☆11Jan 13, 2026Updated 3 months ago
- Simple async job management service using gRPC☆16Apr 23, 2021Updated 4 years ago
- FDTD 3D simulator that generates s-parameters from OFF geometry files using one or more GPUs☆15Jan 16, 2023Updated 3 years ago
- Code Generation Based High Speed Data Serialization Tool☆12Dec 27, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- hugo-with-github-issues☆12Jan 17, 2023Updated 3 years ago
- An open source first-order MATLAB solver for conic programs with row sparsity.☆11May 30, 2017Updated 8 years ago
- ExBLAS: fast, accurate, and reproducible BLAS☆17Sep 13, 2021Updated 4 years ago
- An open-source interface to use the multiple-precision solver SDPA-GMP with YALMIP☆11Apr 8, 2021Updated 5 years ago
- ☆14Jul 25, 2023Updated 2 years ago
- Inline PTX Assembly in CUDA example☆13May 7, 2022Updated 3 years ago
- A Minimalist Asynchronous Toolkit (AMAST) is a small and efficient C99 library that helps manage complex, event-driven programs. It combi…☆25Apr 4, 2026Updated last week
- GPU monitor for CUDA devices☆14Mar 3, 2013Updated 13 years ago
- Note of Youtube lecture, "2017 Numerical methods of PDE", given by Qiqi Wang☆14Jun 18, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 長野高専の3J「アルゴリズムとデータ構造」後期の多倍長演算プログラム☆21Mar 1, 2018Updated 8 years ago
- This example shows how to perform quantization aware training for transfer learned MobileNet-v2 network.☆12Dec 19, 2023Updated 2 years ago
- ☆16Apr 2, 2023Updated 3 years ago
- Kafka Streams DSL inspired, Stream processing library abstracting pipelines pattern using generic.☆15Dec 21, 2022Updated 3 years ago
- ☆11Mar 17, 2022Updated 4 years ago
- Integrating Devito operators into PyTorch☆13Mar 17, 2021Updated 5 years ago
- A library to define abstract linear operators, and associated algebra and matrix-free algorithms, that works with pyTorch Tensors.☆16Dec 7, 2025Updated 4 months ago
- Golang Boilerplate for OpenAI + PostgreSQL + go-chi☆19Apr 9, 2023Updated 3 years ago
- Matlab codes that solve Maxwell's equations with some light-matter interactions using the finite difference time domain (FDTD) method☆10Aug 7, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 2018, 7월 고랭 코리아 밋업 발표자료☆14Jul 26, 2018Updated 7 years ago
- Autonomous Patrolling☆11Dec 12, 2017Updated 8 years ago
- Feedforward Sequential Memory Networks☆16Aug 2, 2022Updated 3 years ago
- Fork of rust concurrent hash map bencmarks to include leapfrog map.☆14Mar 13, 2022Updated 4 years ago
- C modularity step by step☆17Aug 5, 2025Updated 8 months ago
- Certifiably globally optimal unit quaternion rotation averaging via Sparse Bounded-degree sum of squares optimization.☆17Apr 4, 2019Updated 7 years ago
- The aim of this project is to publish and archive newsletters to a target email address.☆21Jan 13, 2024Updated 2 years ago