alpa-projects / tensorflow-alpa
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for tensorflow-alpa
- ☆73Updated last year
- Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.☆46Updated 11 months ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆141Updated 2 years ago
- An Efficient Pipelined Data Parallel Approach for Training Large Model☆70Updated 3 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆114Updated 2 years ago
- ☆66Updated 3 years ago
- An experimental parallel training platform☆52Updated 7 months ago
- Synthesizer for optimal collective communication algorithms☆98Updated 7 months ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32Updated 6 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆250Updated this week
- MLIR-based partitioning system☆42Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆202Updated 2 years ago
- Experiments and prototypes associated with IREE or MLIR☆49Updated 3 months ago
- ☆90Updated 2 years ago
- nnScaler: Compiling DNN models for Parallel Training☆77Updated 3 weeks ago
- ☆19Updated 4 months ago
- ☆45Updated 2 weeks ago
- Shared Middle-Layer for Triton Compilation☆192Updated this week
- ☆140Updated last year
- A schedule language for large model training☆141Updated 5 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆57Updated 6 months ago
- A home for the final text of all TVM RFCs.☆101Updated 2 months ago
- ☆40Updated 3 years ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆57Updated 5 months ago
- Microsoft Collective Communication Library☆322Updated last year
- AI and Memory Wall☆206Updated 8 months ago
- A fast communication-overlapping library for tensor parallelism on GPUs.☆226Updated 3 weeks ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆195Updated 2 years ago
- NCCL Profiling Kit☆113Updated 4 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆81Updated 2 years ago