soumyadipghosh / eventgradLinks
Event-Triggered Communication in Parallel Machine Learning
☆28Updated 3 years ago
Alternatives and similar repositories for eventgrad
Users that are interested in eventgrad are comparing it to the libraries listed below
Sorting:
- ☆9Updated 9 months ago
- LLM training in simple, raw C/CUDA☆99Updated last year
- Some CUDA design patterns and a bit of template magic for CUDA☆155Updated 2 years ago
- Example code to create and train a Pytorch model using the new C++ frontend.☆17Updated 6 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆259Updated 2 years ago
- Customized matrix multiplication kernels☆56Updated 3 years ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated this week
- A parallel framework for training deep neural networks☆62Updated 4 months ago
- ☆13Updated 4 years ago
- Codebase associated with the PyTorch compiler tutorial☆46Updated 5 years ago
- ☆28Updated 6 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆158Updated 3 weeks ago
- Parallel network flows using OpenMP and CUDA.☆28Updated 6 years ago
- The Foundation for All Legate Libraries☆218Updated this week
- Introduction to CUDA programming☆123Updated 8 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated last week
- Deadline-based hyperparameter tuning on RayTune.☆31Updated 5 years ago
- PyTorch interface for the IPU☆180Updated last year
- Benchmarks to capture important workloads.☆31Updated 5 months ago
- PyTorch RFCs (experimental)☆133Updated last month
- FAST Randomized SVD on a GPU with CUDA 🏎️☆12Updated 6 years ago
- PyTorch C++ API Documentation☆231Updated this week
- benchmarking some transformer deployments☆26Updated 2 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆63Updated 3 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆85Updated last year
- ☆18Updated 2 years ago
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆142Updated 7 months ago
- Fast, multithreaded, AVX/FMA matrix multiplication kernel in C++ 17☆18Updated 6 years ago