Research and development for optimizing transformers
☆132Feb 16, 2021Updated 5 years ago
Alternatives and similar repositories for substation
Users that are interested in substation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- DaCe - Data Centric Parallel Programming☆584Jun 4, 2026Updated last week
- A Data-Centric Compiler for Machine Learning☆85Dec 14, 2025Updated 5 months ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆133Jul 6, 2023Updated 2 years ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- ☆78May 4, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repository has moved, please visit https://github.com/ai2cm/pace for the latest development of fv3core.☆13Dec 21, 2022Updated 3 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Aug 1, 2021Updated 4 years ago
- Source code repo for paper "TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation"☆10Aug 11, 2023Updated 2 years ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆56Jul 21, 2021Updated 4 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 3 years ago
- C++/MPI proxies for distributed training of deep neural networks.☆16Jun 18, 2022Updated 3 years ago
- ☆13Mar 27, 2020Updated 6 years ago
- ☆19Jun 3, 2023Updated 3 years ago
- A library to analyze PyTorch traces.☆528May 29, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Analyze network performance in distributed training☆20Oct 20, 2020Updated 5 years ago
- ☆251Jul 25, 2024Updated last year
- The code for our paper "Neural Architecture Search as Program Transformation Exploration"☆16Apr 28, 2021Updated 5 years ago
- ☆13Jan 23, 2021Updated 5 years ago
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆17Apr 1, 2025Updated last year
- Rich editor for SDFGs with included profiling and debugging, static analysis, and interactive optimization.☆22Dec 9, 2025Updated 6 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆223Aug 19, 2024Updated last year
- A Chainer extension for K-FAC☆20Jun 16, 2019Updated 6 years ago
- BytePS examples (Vision, NLP, GAN, etc)☆19Nov 24, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 4 years ago
- Standalone mini-app of the ECMWF cloud microphysics parameterization☆11Apr 22, 2026Updated last month
- Sequence-level 1F1B schedule for LLMs.☆37Aug 26, 2025Updated 9 months ago
- A GPU performance profiling tool for PyTorch models☆512Jul 13, 2021Updated 4 years ago
- MONeT framework for reducing memory consumption of DNN training☆174May 4, 2021Updated 5 years ago
- A GPipe implementation in PyTorch☆864Jul 25, 2024Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- A tensor-aware point-to-point communication primitive for machine learning☆287Dec 17, 2025Updated 5 months ago
- Large scale graph learning on a single machine.☆167Feb 25, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.☆1,546Jul 18, 2025Updated 10 months ago
- PyTorch extensions for high performance and large scale training.☆3,407Apr 26, 2025Updated last year
- ☆13Nov 25, 2022Updated 3 years ago
- paper and code for New Directions in Cloud Programming, CIDR 2021☆11Feb 17, 2021Updated 5 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,001Sep 19, 2024Updated last year
- Benchmark scripts for TVM☆74Mar 15, 2022Updated 4 years ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆479Mar 15, 2024Updated 2 years ago