📑 Dive into Big Model Training
☆115Dec 1, 2022Updated 3 years ago
Alternatives and similar repositories for Dive-into-Big-Model-Training
Users that are interested in Dive-into-Big-Model-Training are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Apr 20, 2022Updated 4 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- ☆19Feb 15, 2023Updated 3 years ago
- ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.☆27Jul 6, 2023Updated 2 years ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Primo: Practical Learning-Augmented Systems with Interpretable Models☆19Dec 26, 2023Updated 2 years ago
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆71Mar 20, 2025Updated last year
- ☆30Dec 2, 2022Updated 3 years ago
- ☆27May 31, 2023Updated 2 years ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Feb 12, 2023Updated 3 years ago
- Federated reconnaissance mini-ImageNet benchmark and baseline models☆13Sep 2, 2021Updated 4 years ago
- ☆16Sep 4, 2023Updated 2 years ago
- A schedule language for large model training☆152Aug 21, 2025Updated 8 months ago
- ☆13Feb 22, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆42Apr 25, 2024Updated 2 years ago
- A concise implementation of SimCSE☆16Aug 2, 2021Updated 4 years ago
- 🕹 Implementation for the lesson Compiling Engineering(2020 Spring) in Peking University, adjusted from UCLA CS 132 Project.☆10Jun 21, 2020Updated 5 years ago
- egraphs-good website☆18Mar 10, 2026Updated last month
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆28Apr 25, 2023Updated 3 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- Paper list for accleration of transformers☆14Jul 1, 2023Updated 2 years ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago
- 基于FPGA实现用户态中断硬件机制与优化操作系统内核☆10Apr 1, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Machine Translation Web Interface for OpenNMT-py☆26Dec 24, 2021Updated 4 years ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆30Feb 4, 2025Updated last year
- Framework of pa code for THU compiler principle course.☆13Dec 18, 2019Updated 6 years ago
- ☆22Nov 7, 2018Updated 7 years ago
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆52May 31, 2023Updated 2 years ago
- Machine learning on serverless platform☆10Jul 2, 2022Updated 3 years ago
- Source code for iCache-HPCA'23☆50Apr 22, 2023Updated 3 years ago
- Torch Distributed Experimental☆117Aug 5, 2024Updated last year
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆24Jul 7, 2024Updated last year
- ☆21Mar 15, 2023Updated 3 years ago
- how to learn PyTorch and OneFlow☆497Mar 22, 2024Updated 2 years ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆146Jun 25, 2022Updated 3 years ago
- Switch-based Training Acceleration for Machine Learning (SwitchML)☆16Apr 13, 2021Updated 5 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆66Mar 21, 2022Updated 4 years ago
- Source code for AdaMBE-SC'24☆25Jun 20, 2024Updated last year