soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆30Updated 3 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- A library for syntactically rewriting Python programs, pronounced (sinner).☆68Updated 3 years ago
- Introduction to CUDA programming☆126Updated 8 years ago
- PyTorch interface for the IPU☆181Updated 2 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- A library of GPU kernels for sparse matrix operations.☆274Updated 4 years ago
- The Foundation for All Legate Libraries☆228Updated last week
- LLM training in simple, raw C/CUDA☆105Updated last year
- A tensor-aware point-to-point communication primitive for machine learning☆273Updated last month
- Full-speed Array of Structures access☆173Updated 2 years ago
- ☆31Updated 5 years ago
- benchmarking some transformer deployments☆26Updated 2 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆156Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆88Updated last year
- Training neural networks in TensorFlow 2.0 with 5x less memory☆136Updated 3 years ago
- Codebase associated with the PyTorch compiler tutorial☆46Updated 6 years ago
- DLPack for Tensorflow☆35Updated 5 years ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- A Library for fast Hash Tables on GPUs☆126Updated last week
- PyTorch RFCs (experimental)☆135Updated 4 months ago
- PyTorch C++ API Documentation☆240Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated last month
- Subset of BLAS routines optimized for NVIDIA GPUs☆73Updated 2 years ago
- A fast tensor library for c++.☆11Updated 10 years ago
- MagmaDNN: a simple deep learning framework in c++☆50Updated 5 years ago
- sparse matrix pre-processing library☆83Updated last year
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆50Updated 7 years ago
- CUDA kernel author's tools☆113Updated 3 years ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 5 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆120Updated 10 months ago
- ArrayFire's Machine Learning Library.☆105Updated 7 years ago