hpcaitech/SkyComputing

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hpcaitech/SkyComputing)

hpcaitech / SkyComputing

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

☆90

Alternatives and similar repositories for SkyComputing

Users that are interested in SkyComputing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hpcaitech / ColossalAI-Benchmark
View on GitHub
Performance benchmarking with ColossalAI
☆39Jul 6, 2022Updated 4 years ago
hpcaitech / PaLM-colossalai
View on GitHub
Scalable PaLM implementation of PyTorch
☆190Dec 19, 2022Updated 3 years ago
hpcaitech / ColossalAI-Examples
View on GitHub
Examples of training models with hybrid parallelism using ColossalAI
☆339Mar 23, 2023Updated 3 years ago
zhuzilin / pytorch-malloc
View on GitHub
An external memory allocator example for PyTorch.
☆16Aug 10, 2025Updated 11 months ago
hpcaitech / Titans
View on GitHub
A collection of models built with ColossalAI
☆33Nov 22, 2022Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
feifeibear / PSTensor
View on GitHub
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Feb 10, 2022Updated 4 years ago
hpcaitech / FastFold
View on GitHub
Optimizing AlphaFold Training and Inference on GPU Clusters
☆617Jul 16, 2024Updated 2 years ago
hpcaitech / ColossalAI-Pytorch-lightning
View on GitHub
☆24Nov 22, 2022Updated 3 years ago
hpcaitech / EnergonAI
View on GitHub
Large-scale model inference.
☆629Sep 12, 2023Updated 2 years ago
hpcaitech / ColossalAI-Documentation
View on GitHub
Documentation for Colossal-AI
☆25Jun 6, 2025Updated last year
microsoft / dist-ir
View on GitHub
An IR for efficiently simulating distributed ML computation.
☆33Jan 13, 2024Updated 2 years ago
alibaba / SRDiffusion
View on GitHub
Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
☆20Jun 11, 2025Updated last year
hpcaitech / Elixir
View on GitHub
Elixir: Train a Large Language Model on a Small GPU Cluster
☆16Jun 8, 2023Updated 3 years ago
jinminhao / PANTS
View on GitHub
[Usenix Security '25] Robustifying ML-powered Network Classifiers with PANTS
☆23Aug 16, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hpcaitech / GPT-Demo
View on GitHub
GPT Demo with hybrid distributed training
☆10Dec 1, 2022Updated 3 years ago
zhuzilin / chatgpt-desktop
View on GitHub
Desktop version of ChatGPT, support manually set cookie
☆19Dec 9, 2022Updated 3 years ago
shuoshuc / FabricEval
View on GitHub
An evaluation framework for data center traffic engineering.
☆14Jul 28, 2024Updated last year
hpcaitech / CachedEmbedding
View on GitHub
A memory efficient DLRM training solution using ColossalAI
☆108Nov 22, 2022Updated 3 years ago
AD1024 / veripy
View on GitHub
Python3 auto-active verification library (migrated to an Intel project)
☆24Apr 7, 2022Updated 4 years ago
ADG4050 / Exploring-Lightweight-Federated-Learning-for-load-forecasting
View on GitHub
Federated Learning on Energy Dataset for load forecasting using clustering and sequential DNN methods
☆14Sep 19, 2024Updated last year
alibaba / GPU-scheduler-for-deep-learning
View on GitHub
GPU-scheduler-for-deep-learning
☆214Nov 5, 2020Updated 5 years ago
kay-cottage / Mini_Reverse_Proxy
View on GitHub
不到100行代码实现一个Python迷你内网穿透、反向正向代理小工具
☆12May 27, 2023Updated 3 years ago
bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
pengyanghua / DL2
View on GitHub
a deep learning-driven scheduler for elastic training in deep learning clusters
☆31Jan 14, 2021Updated 5 years ago
PersiaML / PERSIA
View on GitHub
High performance distributed framework for training deep learning recommendation models based on PyTorch.
☆414Updated this week
H-Freax / Awesome-Graph-RAG
View on GitHub
This repository compiles a list of papers/resources related to the graph retrieval-augmented generation! Star⭐ the repo and follow me if …
☆10Dec 7, 2024Updated last year
yanghu-bit / FlexEntry
View on GitHub
Mitigating Routing Update Overhead for Traffic Engineering by Combining Destination-based Routing with Reinforcement Learning
☆15Oct 16, 2022Updated 3 years ago
BorealisAI / ssl-for-timeseries
View on GitHub
Self Supervised Learning for Time Series Using Similarity Distillation
☆11Jun 29, 2022Updated 4 years ago
ganler / nnsmith-asplos-artifact
View on GitHub
https://nnsmith-asplos.rtfd.io Artifact of "NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers" ASPLOS'23
☆11Mar 29, 2023Updated 3 years ago
ROBUST-NL / paused_ev_charging
View on GitHub
Source code for the paper titled: "Unlocking the full potential of smart charging: Addressing paused and delayed charging problems in ele…
☆11May 22, 2024Updated 2 years ago
yusx-swapp / SPATL
View on GitHub
SPATL: Salient Prameter Aggregation and Transfer Learning for Heterogeneous Federated Learning
☆24Nov 17, 2022Updated 3 years ago
harvard-cns / Harvard-CNS-Seminar
View on GitHub
Reading seminar in Harvard Cloud Networking and Systems Group
☆16Aug 29, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
PipeFusion / PipeFusion
View on GitHub
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆58May 3, 2026Updated 2 months ago
S-Lab-System-Group / ChronusArtifact
View on GitHub
☆23Jan 7, 2022Updated 4 years ago
microsoft / Soroush
View on GitHub
Microsoft's open source max-min fair solver for cluster scheduling and traffic engineering
☆19Apr 13, 2026Updated 3 months ago
quiver-team / quiver-feature
View on GitHub
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
☆55Jul 3, 2022Updated 4 years ago
netiken / m4
View on GitHub
[TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…
☆21Jun 19, 2026Updated last month
kaixindelele / EvolveWithAI
View on GitHub
我的一些开源文档
☆10Feb 18, 2025Updated last year
DLFC / ps-mpi
View on GitHub
A parameter server implement with MPI.
☆11Nov 15, 2017Updated 8 years ago