soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆29Updated 4 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- Introduction to CUDA programming☆129Updated 8 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆157Updated 2 years ago
- MagmaDNN: a simple deep learning framework in c++☆51Updated 5 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆29Updated 8 years ago
- Automatic Differentiation C++ Library☆58Updated 5 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆91Updated 2 years ago
- ☆16Updated last year
- A tracing JIT compiler for PyTorch☆13Updated 4 years ago
- The Foundation for All Legate Libraries☆233Updated this week
- Codebase associated with the PyTorch compiler tutorial☆47Updated 6 years ago
- ☆19Updated 3 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆74Updated 2 years ago
- NumPy-compatible multidimensional arrays in C++☆163Updated last year
- PyTorch C++ API Documentation☆242Updated this week
- A tensor-aware point-to-point communication primitive for machine learning☆280Updated last week
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 5 years ago
- A library of GPU kernels for sparse matrix operations.☆280Updated 5 years ago
- A Library for fast Hash Tables on GPUs☆130Updated 2 months ago
- ArrayFire's Machine Learning Library.☆105Updated 7 years ago
- Customized matrix multiplication kernels☆57Updated 3 years ago
- Parallel network flows using OpenMP and CUDA.☆28Updated 7 years ago
- kmeans clustering with multi-GPU capabilities☆120Updated 2 years ago
- C++ API to log data in tensorboard format.☆82Updated 5 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆55Updated this week
- a CUDA implementation of a priority queue☆84Updated 5 years ago
- Full-speed Array of Structures access☆176Updated 2 years ago
- A fast tensor library for c++.☆11Updated 10 years ago
- CUDA kernel author's tools☆115Updated 3 years ago