paperg / NCCL_GPLinks
Separate from hardware and used to learn some NCCL mechanisms
☆24Updated last year
Alternatives and similar repositories for NCCL_GP
Users that are interested in NCCL_GP are comparing it to the libraries listed below
Sorting:
- ☆229Updated last week
- ☆217Updated 2 years ago
- example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory☆152Updated last year
- ☆30Updated last year
- DeepSeek-V3/R1 inference performance simulator☆175Updated 9 months ago
- RDMA and SHARP plugins for nccl library☆218Updated last month
- Repository for MLCommons Chakra schema and tools☆146Updated 2 months ago
- NCCL Profiling Kit☆150Updated last year
- ☆42Updated last year
- TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches☆79Updated 2 years ago
- ☆22Updated 10 months ago
- ☆40Updated 4 years ago
- Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation☆39Updated last month
- ☆71Updated last week
- Synthesizer for optimal collective communication algorithms☆122Updated last year
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆505Updated this week
- ☆47Updated last year
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆66Updated last year
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Updated 2 years ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆287Updated 4 months ago
- ☆92Updated 9 months ago
- Open source version of DOCA GPUNetIO and DOCA Verbs libraries (limited features) to enable GDAKI technology on RDMA (IB and RoCE)☆23Updated 3 weeks ago
- Venus Collective Communication Library, supported by SII and Infrawaves.☆129Updated 2 weeks ago
- Simulating Distributed Training at Scale☆14Updated 3 months ago
- ☆91Updated 4 months ago
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆60Updated last month
- RDMA exmaple☆231Updated 3 years ago
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆73Updated 8 months ago
- Repository for MLCommons Chakra schema and tools☆39Updated 2 years ago
- ☆166Updated last year