soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆28Updated 3 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- LLM training in simple, raw C/CUDA☆99Updated last year
- A parallel framework for training deep neural networks☆60Updated 2 months ago
- Benchmarks to capture important workloads.☆31Updated 4 months ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆28Updated 7 years ago
- ☆18Updated 2 years ago
- A Repository with C++ implementations of Reinforcement Learning Algorithms (Pytorch)☆96Updated 5 years ago
- CUDA kernel author's tools☆111Updated 3 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆154Updated 2 years ago
- Codebase associated with the PyTorch compiler tutorial☆45Updated 5 years ago
- ☆16Updated 8 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated last week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- benchmarking some transformer deployments☆26Updated 2 years ago
- Fast, multithreaded, AVX/FMA matrix multiplication kernel in C++ 17☆18Updated 6 years ago
- A bunch of kernels that might make stuff slower 😉☆48Updated this week
- ☆32Updated 4 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆82Updated last year
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆60Updated last month
- Customized matrix multiplication kernels☆54Updated 3 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆45Updated 2 weeks ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆44Updated 10 months ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- ☆28Updated 4 months ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 4 years ago
- ☆26Updated last year
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆142Updated 5 months ago