soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆29Updated 3 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- Some CUDA design patterns and a bit of template magic for CUDA☆157Updated 2 years ago
- Introduction to CUDA programming☆129Updated 8 years ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- Example code to create and train a Pytorch model using the new C++ frontend.☆17Updated 6 years ago
- Codebase associated with the PyTorch compiler tutorial☆47Updated 6 years ago
- MagmaDNN: a simple deep learning framework in c++☆51Updated 5 years ago
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆29Updated 8 years ago
- A Library for fast Hash Tables on GPUs☆127Updated last month
- PyTorch interface for the IPU☆181Updated 2 years ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 5 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆275Updated last month
- ArrayFire's Machine Learning Library.☆105Updated 7 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated 3 months ago
- ☆31Updated 5 years ago
- PyTorch C++ API Documentation☆241Updated this week
- Customized matrix multiplication kernels☆57Updated 3 years ago
- Loop Nest - Linear algebra compiler and code generator.☆21Updated 3 years ago
- CUDA kernel author's tools☆114Updated 3 years ago
- An Aspiring Drop-In Replacement for Pandas at Scale☆74Updated 4 years ago
- Benchmark of expression templates libraries☆42Updated 5 years ago
- The Foundation for All Legate Libraries☆232Updated this week
- Parallel network flows using OpenMP and CUDA.☆28Updated 7 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆74Updated 2 years ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆68Updated 3 years ago
- ☆19Updated 3 years ago
- PyTorch RFCs (experimental)☆136Updated 6 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆52Updated this week
- benchmarking some transformer deployments☆26Updated last week