soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆29Updated 4 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- Introduction to CUDA programming☆129Updated 8 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆158Updated 2 years ago
- Codebase associated with the PyTorch compiler tutorial☆47Updated 6 years ago
- Customized matrix multiplication kernels☆57Updated 3 years ago
- kmeans clustering with multi-GPU capabilities☆122Updated 2 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 5 years ago
- PyTorch C++ API Documentation☆243Updated this week
- Subset of BLAS routines optimized for NVIDIA GPUs☆76Updated 2 years ago
- MagmaDNN: a simple deep learning framework in c++☆51Updated 5 years ago
- LLM training in simple, raw C/CUDA☆110Updated last year
- The Foundation for All Legate Libraries☆233Updated this week
- A fast tensor library for c++.☆11Updated 10 years ago
- ☆19Updated 3 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated last month
- A library for syntactically rewriting Python programs, pronounced (sinner).☆67Updated 3 years ago
- A GPU performance prediction toolkit for CUDA programs☆18Updated 6 years ago
- C++ API to log data in tensorboard format.☆82Updated 6 months ago
- A tensor-aware point-to-point communication primitive for machine learning☆283Updated last month
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆50Updated 7 years ago
- Example code to create and train a Pytorch model using the new C++ frontend.☆17Updated 6 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆93Updated 2 years ago
- A library of GPU kernels for sparse matrix operations.☆283Updated 5 years ago
- PyTorch interface for the IPU☆181Updated 2 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆70Updated 9 months ago
- A tracing JIT compiler for PyTorch☆13Updated 4 years ago
- A Library for fast Hash Tables on GPUs☆131Updated 3 months ago
- CUDA kernel author's tools☆115Updated 3 years ago
- Benchmark of expression templates libraries☆43Updated 5 years ago
- Full-speed Array of Structures access☆176Updated 2 years ago