soumyadipghosh / eventgradLinks

Event-Triggered Communication in Parallel Machine Learning

☆28

Alternatives and similar repositories for eventgrad

Users that are interested in eventgrad are comparing it to the libraries listed below

Sorting:

dlsyscourse / lecture13
☆9Updated 9 months ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆99Updated last year
PatWie / cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
☆155Updated 2 years ago
tebesu / pytorch-cpp-tutorial
Example code to create and train a Pytorch model using the new C++ frontend.
☆17Updated 6 years ago
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
nunoplopes / torchy
A tracing JIT compiler for PyTorch
☆13Updated 3 years ago
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆259Updated 2 years ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆56Updated 3 years ago
NVIDIA / numbast
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
☆47Updated this week
axonn-ai / axonn
A parallel framework for training deep neural networks
☆62Updated 4 months ago
zdevito / custom_loader
☆13Updated 4 years ago
bwasti / pytorch_compiler_tutorial
Codebase associated with the PyTorch compiler tutorial
☆46Updated 5 years ago
Jokeren / triton-samples
☆28Updated 6 months ago
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated 3 weeks ago
vulq / Flo
Parallel network flows using OpenMP and CUDA.
☆28Updated 6 years ago
nv-legate / legate
The Foundation for All Legate Libraries
☆218Updated this week
csc-training / CUDA
Introduction to CUDA programming
☆123Updated 8 years ago
pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated last week
ucbrise / hypersched
Deadline-based hyperparameter tuning on RayTune.
☆31Updated 5 years ago
graphcore / poptorch
PyTorch interface for the IPU
☆180Updated last year
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆31Updated 5 months ago
pytorch / rfcs
PyTorch RFCs (experimental)
☆133Updated last month
Michalos88 / Randomized_SVD_in_CUDA
FAST Randomized SVD on a GPU with CUDA 🏎️
☆12Updated 6 years ago
pytorch / cppdocs
PyTorch C++ API Documentation
☆231Updated this week
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆63Updated 3 months ago
NVIDIA / compute-sanitizer-samples
Samples demonstrating how to use the Compute Sanitizer Tools and Public API
☆85Updated last year
ahennequ / cuda-tensorcores-register-mapping
☆18Updated 2 years ago
fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆142Updated 7 months ago
talhasaruhan / cpp-matmul
Fast, multithreaded, AVX/FMA matrix multiplication kernel in C++ 17
☆18Updated 6 years ago