A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, please visit/star/fork https://github.com/PKU-DAIR/Hetu
☆126Dec 18, 2023Updated 2 years ago
Alternatives and similar repositories for Hetu
Users that are interested in Hetu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A high-performance distributed deep learning system targeting large-scale and automated distributed training.☆337Dec 13, 2025Updated 6 months ago
- Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). If you hav…☆25Oct 22, 2025Updated 7 months ago
- ☆21Oct 31, 2022Updated 3 years ago
- Research paper list for host networking: in a system view☆10Jan 2, 2025Updated last year
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆23Jan 7, 2022Updated 4 years ago
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆52May 31, 2023Updated 3 years ago
- Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).☆181Updated this week
- A scalable graph learning toolkit for extremely large graph datasets. (WWW'22, 🏆 Best Student Paper Award)☆158May 10, 2024Updated 2 years ago
- Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices☆12Jul 1, 2021Updated 4 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Jul 3, 2022Updated 3 years ago
- ☆13Jan 23, 2021Updated 5 years ago
- Herald: Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024)☆23May 9, 2024Updated 2 years ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆145Jun 25, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆57May 29, 2024Updated 2 years ago
- Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"☆42Nov 16, 2021Updated 4 years ago
- Towards Generalized and Efficient Blackbox Optimization System/Package (KDD 2021 & JMLR 2024)☆442Mar 28, 2026Updated 2 months ago
- ☆92Apr 2, 2022Updated 4 years ago
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆19Dec 8, 2023Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4☆993Jun 4, 2026Updated 2 weeks ago
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆271Mar 31, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆80Mar 7, 2022Updated 4 years ago
- ☆11Jun 25, 2021Updated 4 years ago
- Dorylus: Affordable, Scalable, and Accurate GNN Training☆76May 31, 2021Updated 5 years ago
- paper and its code for AI System☆366May 14, 2026Updated last month
- A fast MoE impl for PyTorch☆1,855Feb 10, 2025Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆113Sep 10, 2024Updated last year
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,887Updated this week
- Training and serving large-scale neural networks with auto parallelization.☆3,182Dec 9, 2023Updated 2 years ago
- SOTA Learning-augmented Systems☆37May 21, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An experimental parallel training platform☆57Mar 25, 2024Updated 2 years ago
- Microsoft Collective Communication Library☆391Sep 20, 2023Updated 2 years ago
- nnScaler: Compiling DNN models for Parallel Training☆132Jun 10, 2026Updated last week
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- ☆394Nov 4, 2022Updated 3 years ago
- Graph Sampling using GPU☆52Mar 17, 2022Updated 4 years ago
- A Factored System for Sample-based GNN Training over GPUs☆46Jul 26, 2023Updated 2 years ago