soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆28Updated 3 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- ☆32Updated 4 years ago
- Benchmarks to capture important workloads.☆31Updated 5 months ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆28Updated 7 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 4 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 11 months ago
- Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning (NeurIPS 2020)☆22Updated 2 years ago
- ☆16Updated 9 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- LLM training in simple, raw C/CUDA☆99Updated last year
- Some CUDA design patterns and a bit of template magic for CUDA☆154Updated 2 years ago
- Personal solutions to the Triton Puzzles☆19Updated 11 months ago
- ☆29Updated 2 years ago
- Fast, multithreaded, AVX/FMA matrix multiplication kernel in C++ 17☆18Updated 6 years ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆114Updated this week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated this week
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆142Updated 6 months ago
- A parallel framework for training deep neural networks☆61Updated 3 months ago
- A Library for fast Hash Tables on GPUs☆124Updated 2 years ago
- High dimensional black-box optimizer using Latent Action Monte Carlo Tree Search algorithm☆28Updated 2 years ago
- ☆18Updated 2 years ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆46Updated 3 years ago
- CUDA accelerated medical imaging algorithms☆14Updated 3 years ago
- CUDA kernel author's tools☆111Updated 3 years ago
- A bunch of kernels that might make stuff slower 😉☆53Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆90Updated 3 weeks ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆61Updated 2 months ago
- A minimalistic header only C++11 Neural Network library based on Eigen::Tensor☆20Updated 7 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆83Updated last year
- Benchmark tests supporting the TiledCUDA library.☆16Updated 7 months ago