A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆299Nov 28, 2018Updated 7 years ago
Alternatives and similar repositories for cnmem
Users that are interested in cnmem are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- RAPIDS Memory Manager☆699Jun 4, 2026Updated last week
- A rudimentary wrapper around the fast Maxwell kernels for GEMM and convolution operations provided by nervanagpu☆34May 7, 2015Updated 11 years ago
- A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology☆1,380Mar 12, 2026Updated 2 months ago
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆575Sep 15, 2025Updated 8 months ago
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,834Oct 9, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Source code examples from the Parallel Forall Blog☆1,330Sep 23, 2025Updated 8 months ago
- Assembler for NVIDIA Maxwell architecture☆1,070Jan 3, 2023Updated 3 years ago
- Optimized primitives for collective multi-GPU communication☆4,785Jun 4, 2026Updated last week
- Patterns and behaviors for GPU computing☆1,778Jan 17, 2026Updated 4 months ago
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- this is the release repository of superneurons☆54Feb 13, 2021Updated 5 years ago
- CUDA Data Parallel Primitives Library☆438Nov 9, 2018Updated 7 years ago
- Easy benchmarking of all publicly accessible implementations of convnets☆2,690Jun 9, 2017Updated 9 years ago
- Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets☆306Aug 8, 2017Updated 8 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Library to manipulate tensors on the GPU.☆187Mar 21, 2023Updated 3 years ago
- Caffe: a fast open framework for deep learning.☆666Apr 3, 2023Updated 3 years ago
- Implementation of a Tensorflow XLA rematerialization pass☆15Dec 20, 2019Updated 6 years ago
- Code and models from the paper "Layer Normalization"☆243Nov 8, 2016Updated 9 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,001Sep 19, 2024Updated last year
- Intel® Deep Learning Framework☆312Jun 16, 2016Updated 9 years ago
- Programmable CUDA/C++ GPU Graph Analytics☆1,091Feb 28, 2026Updated 3 months ago
- ☆1,649Sep 11, 2018Updated 7 years ago
- Facebook's CUDA extensions.☆284Mar 27, 2019Updated 7 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Acceleration package for neural networks on multi-core CPUs☆1,707Jun 11, 2024Updated 2 years ago
- A Valgrind extension for CUDA, unofficial mirror for https://www.hlrs.de/organization/av/spmt/research/cudagrind/☆10Aug 5, 2015Updated 10 years ago
- Fast Recurrent Networks Library☆578Sep 20, 2016Updated 9 years ago
- A fast and highly scalable GPU dynamic memory allocator☆112Mar 11, 2015Updated 11 years ago
- NumPy interface with mixed backend execution☆1,096Feb 19, 2018Updated 8 years ago
- ☆14Apr 26, 2022Updated 4 years ago
- ☆13Jul 9, 2021Updated 4 years ago
- Node based Gui for creating caffe networks☆103Jan 18, 2021Updated 5 years ago
- Deep Learning GPU Training System☆4,178Jan 7, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆893Sep 26, 2025Updated 8 months ago
- Computation Graph Toolkit☆634Apr 10, 2018Updated 8 years ago
- Low-precision matrix multiplication☆1,842Jan 29, 2024Updated 2 years ago
- A Theano framework for building and training neural networks☆1,152Feb 19, 2019Updated 7 years ago
- Benchmarking Deep Learning operations on different hardware☆1,106Apr 25, 2021Updated 5 years ago
- Deep Reinforcement Learning Agent☆19Dec 9, 2015Updated 10 years ago
- Multi-GPU mini-framework for Theano☆195Sep 25, 2017Updated 8 years ago