ppl-ai / libfabric-efa-demo
☆54Updated last month
Alternatives and similar repositories for libfabric-efa-demo:
Users that are interested in libfabric-efa-demo are comparing it to the libraries listed below
- Perplexity GPU Kernels☆134Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆134Updated 2 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆230Updated this week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆116Updated last year
- CUDA checkpoint and restore utility☆319Updated 2 months ago
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆167Updated this week
- pytorch ucc plugin☆20Updated 3 years ago
- ☆296Updated 7 months ago
- NCCL Profiling Kit☆128Updated 9 months ago
- KV cache store for distributed LLM inference☆107Updated this week
- ☆31Updated 3 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆132Updated this week
- NVIDIA NCCL Tests for Distributed Training☆87Updated 3 weeks ago
- ☆48Updated 3 weeks ago
- Ultra | Ultimate | Unified CCL☆57Updated last month
- Microsoft Collective Communication Library☆64Updated 4 months ago
- High-performance safetensors model loader☆18Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- ☆44Updated 3 years ago
- Microsoft Collective Communication Library☆344Updated last year
- RDMA and SHARP plugins for nccl library☆185Updated 2 weeks ago
- extensible collectives library in triton☆84Updated this week
- A library to analyze PyTorch traces.☆355Updated this week
- Synthesizer for optimal collective communication algorithms☆105Updated 11 months ago
- An I/O benchmark for deep Learning applications☆82Updated this week
- Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the…☆306Updated last week
- DeepSeek-V3/R1 inference performance simulator☆98Updated last week
- A hierarchical collective communications library with portable optimizations☆32Updated 3 months ago
- CloudAI Benchmark Framework☆60Updated last week
- A low-latency & high-throughput serving engine for LLMs☆334Updated 2 months ago