alpa-projects / tensorflow-alpaLinks
☆20Updated 2 years ago
Alternatives and similar repositories for tensorflow-alpa
Users that are interested in tensorflow-alpa are comparing it to the libraries listed below
Sorting:
- ☆79Updated 2 years ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆144Updated 3 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆59Updated last year
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆82Updated 2 years ago
- An experimental parallel training platform☆54Updated last year
- An Efficient Pipelined Data Parallel Approach for Training Large Model☆77Updated 4 years ago
- AI and Memory Wall☆216Updated last year
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆95Updated 2 years ago
- Synthesizer for optimal collective communication algorithms☆110Updated last year
- MLIR-based partitioning system☆105Updated this week
- ☆75Updated 4 years ago
- Microsoft Collective Communication Library☆351Updated last year
- A schedule language for large model training☆149Updated last year
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆20Updated 2 months ago
- ☆145Updated 5 months ago
- Compiler for Dynamic Neural Networks☆46Updated last year
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆67Updated 3 months ago
- Shared Middle-Layer for Triton Compilation☆258Updated this week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆147Updated 2 weeks ago
- Microsoft Collective Communication Library☆64Updated 7 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 8 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆383Updated this week
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆121Updated 3 years ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 2 months ago
- ☆124Updated 2 months ago
- ☆80Updated 2 months ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32Updated last year
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆59Updated 3 years ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆117Updated last year
- A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM☆59Updated 3 months ago