Optimized Parallel Tiled Approach to perform 2D Convolution by taking advantage of the lower latency, higher bandwidth shared memory as well as global constant memory cached aggresively within GPU thread blocks.
☆15Oct 17, 2017Updated 8 years ago
Alternatives and similar repositories for cuda-tiled-2D-convolution
Users that are interested in cuda-tiled-2D-convolution are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a simple 2d convolution written in cuda c which uses shared memory for better performance☆20Apr 12, 2018Updated 8 years ago
- Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memo…☆16Sep 24, 2017Updated 8 years ago
- Repository for all balance bot related code☆22Jun 20, 2025Updated 10 months ago
- Sample overlays and configuration files to assist with running zephyr samples on Xiao boards☆11Jun 6, 2024Updated last year
- C program for Drawwing Complex graphics with L-edit☆10Jan 7, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Static-sized long-precision arithmetic library for use inside GPU parallelization with CUDA☆11Apr 5, 2025Updated last year
- ☆11Feb 3, 2026Updated 3 months ago
- Extension of Convex.jl for disciplined multiconvex optimization☆10Feb 22, 2017Updated 9 years ago
- "유닉스 리눅스 셸 스크립트 예제 사전: Unix & Linux Shell Script Exercise Dictionary" - 한빛미디어☆10Jan 17, 2017Updated 9 years ago
- ☆21Jan 23, 2026Updated 3 months ago
- PETSc Interface for Octave and MATLAB (Deprecated)☆10Nov 10, 2022Updated 3 years ago
- Matlab mex wrappers to cuSPARSE (NVIDIA)☆11Dec 10, 2025Updated 4 months ago
- A CUDA-based voxelizer used in acoustics FDTD calculations.☆11Dec 10, 2020Updated 5 years ago
- 2D and 3D Matrix Convolution and Matrix Multiplication with CUDA☆10Jun 14, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Zephyr driver for PCF85063A☆11Jan 13, 2026Updated 3 months ago
- Simple async job management service using gRPC☆16Apr 23, 2021Updated 5 years ago
- Irene is a python package that aims to be a toolkit for global optimization problems that can be realized algebraically. It generalizes L…☆15Apr 28, 2026Updated last week
- hugo-with-github-issues☆12Jan 17, 2023Updated 3 years ago
- ☆14Jul 25, 2023Updated 2 years ago
- A machine learning library capable of training various deep neural networks (RNNs, LSTMs, DBNs, ect...) on a GPU. It makes use of auto-di…☆10Aug 28, 2018Updated 7 years ago
- ☆33Oct 2, 2025Updated 7 months ago
- Inline PTX Assembly in CUDA example☆14May 7, 2022Updated 3 years ago
- A Minimalist Asynchronous Toolkit (AMAST) is a small and efficient C99 library that helps manage complex, event-driven programs. It combi…☆25Apr 4, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆15Aug 9, 2023Updated 2 years ago
- GPU monitor for CUDA devices☆14Mar 3, 2013Updated 13 years ago
- Note of Youtube lecture, "2017 Numerical methods of PDE", given by Qiqi Wang☆14Jun 18, 2018Updated 7 years ago
- 長野高専の3J「アルゴリズムとデータ構造」後期の多倍長演算プログラム☆21Mar 1, 2018Updated 8 years ago
- This example shows how to perform quantization aware training for transfer learned MobileNet-v2 network.☆12Dec 19, 2023Updated 2 years ago
- Cloud-Barista Multi-Cloud Application Runtime Framework : Support Multi-Cloud Kubernetes Service☆12Sep 13, 2025Updated 7 months ago
- ☆16Apr 2, 2023Updated 3 years ago
- Kafka Streams DSL inspired, Stream processing library abstracting pipelines pattern using generic.☆15Dec 21, 2022Updated 3 years ago
- Automated Discovery and Optimization of 3D Topological Photonic Crystals☆11Mar 16, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Performant kernels for symmetric tensors☆16Aug 22, 2024Updated last year
- ☆11Mar 17, 2022Updated 4 years ago
- Cross-Platform object detection using TensorFlow Lite and OpenCV in C++☆18Apr 26, 2020Updated 6 years ago
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆19Mar 7, 2023Updated 3 years ago
- A library to define abstract linear operators, and associated algebra and matrix-free algorithms, that works with pyTorch Tensors.☆16Dec 7, 2025Updated 4 months ago
- Golang Boilerplate for OpenAI + PostgreSQL + go-chi☆19Apr 9, 2023Updated 3 years ago
- some basic algorithms explored on the Yee Grid in FDTD☆14Sep 22, 2018Updated 7 years ago