soumyadipghosh / eventgrad
Event-Triggered Communication in Parallel Machine Learning
☆25Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for eventgrad
- Demo dataset for libtorch☆54Updated 2 years ago
- Codebase associated with the PyTorch compiler tutorial☆44Updated 5 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- ☆18Updated 2 years ago
- Example code to create and train a Pytorch model using the new C++ frontend.☆17Updated 5 years ago
- ☆14Updated last month
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- benchmarking some transformer deployments☆26Updated last year
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 4 years ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- Some CUDA design patterns and a bit of template magic for CUDA☆146Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆59Updated 7 months ago
- Automatically insert nvtx ranges to PyTorch models☆17Updated 3 years ago
- image to column☆31Updated 10 years ago
- CUDA kernel author's tools☆108Updated 2 years ago
- ☆35Updated last year
- C++ API to log data in tensorboard format.☆76Updated 2 years ago
- ☆31Updated 4 years ago
- DLPack for Tensorflow☆36Updated 4 years ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆70Updated 2 years ago
- Introduction to CUDA programming☆113Updated 7 years ago
- Deadline-based hyperparameter tuning on RayTune.☆31Updated 4 years ago
- ☆22Updated 10 months ago
- ☆9Updated last month
- A basic Docker-based installation of TVM☆12Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆65Updated last year
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago
- ☆48Updated 8 months ago
- a high performance system for customized-precision distributed deep learning☆12Updated 3 years ago