eliben / cs344Links
Introduction to Parallel Programming class code
☆30Updated 10 years ago
Alternatives and similar repositories for cs344
Users that are interested in cs344 are comparing it to the libraries listed below
Sorting:
- Symbolic differentiation engine for optimization-based machine learning models.☆43Updated 8 years ago
- Randomized Decision Trees: A Fast C++ Implementation of Random Forests.☆179Updated 5 years ago
- Fork of magma to include more BLAS☆28Updated 9 years ago
- Boda: A C++ Framework for Efficient Experiments in Computer Vision☆64Updated 6 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆137Updated 8 years ago
- My solutions to Udacity's Parallel Programming course (CS 344)☆76Updated 8 years ago
- neon tutorials☆93Updated 3 years ago
- Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends☆182Updated 7 years ago
- Parallel Algorithm Scheduling Library☆107Updated 8 years ago
- This project is a simple deep neural network trained using only TensorFlow C++.☆117Updated 2 years ago
- related materials for coursera & edx MOOCs, will no longer update.☆64Updated 9 years ago
- C++ 11 implementation of Geoff Hinton's Deep Learning matlab code☆286Updated 10 years ago
- The "CUDA templates" are a collection of C++ template classes and functions which provide a consistent interface to NVIDIA's "Compute Uni…☆27Updated 14 years ago
- This repository contains easy-to-read Python/CUDA implementations of fundamental GPU computing primitives.☆36Updated 10 years ago
- Scientific library for high-precision computations and research☆49Updated 8 years ago
- Proof-of-Concept CNN in Halide☆22Updated 9 years ago
- A Light-weight and Fast Template Matrix Library☆132Updated 12 years ago
- Papers and blogs related to distributed deep learning☆96Updated 8 years ago
- Resources to work offline on the assignments of Heterogenous Parallel Programming course from Coursera.☆72Updated 6 years ago
- clang with OpenMP 3.1 and some elements of OpenMP 4.0 support☆90Updated 10 years ago
- tutorial to optimize GEMM performance on android☆51Updated 9 years ago
- Deep neural network framework (C/C++/CUDA).☆32Updated 10 years ago
- (Spring 2017) Assignment 2: GPU Executor☆63Updated 8 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 8 years ago
- Fast matrix multiplication☆31Updated 4 years ago
- Vector Math Library☆84Updated 2 months ago
- A portable high-level API with CUDA or OpenCL back-end☆55Updated 8 years ago
- a heterogeneous multiGPU level-3 BLAS library☆46Updated 6 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆299Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 8 years ago