Research and development for optimizing transformers
☆131Feb 16, 2021Updated 5 years ago
Alternatives and similar repositories for substation
Users that are interested in substation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- DaCe - Data Centric Parallel Programming☆580Updated this week
- A Data-Centric Compiler for Machine Learning☆85Dec 14, 2025Updated 3 months ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 7 months ago
- ☆78May 4, 2021Updated 4 years ago
- This repository has moved, please visit https://github.com/ai2cm/pace for the latest development of fv3core.☆13Dec 21, 2022Updated 3 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Aug 1, 2021Updated 4 years ago
- Source code repo for paper "TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation"☆10Aug 11, 2023Updated 2 years ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆56Jul 21, 2021Updated 4 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- C++/MPI proxies for distributed training of deep neural networks.☆15Jun 18, 2022Updated 3 years ago
- ☆13Mar 27, 2020Updated 5 years ago
- ☆20Jun 3, 2023Updated 2 years ago
- A library to analyze PyTorch traces.☆474Updated this week
- Analyze network performance in distributed training☆20Oct 20, 2020Updated 5 years ago
- ☆251Jul 25, 2024Updated last year
- The code for our paper "Neural Architecture Search as Program Transformation Exploration"☆16Apr 28, 2021Updated 4 years ago
- ☆13Jan 23, 2021Updated 5 years ago
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆17Apr 1, 2025Updated 11 months ago
- Rich editor for SDFGs with included profiling and debugging, static analysis, and interactive optimization.☆22Dec 9, 2025Updated 3 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆222Aug 19, 2024Updated last year
- A Chainer extension for K-FAC☆20Jun 16, 2019Updated 6 years ago
- BytePS examples (Vision, NLP, GAN, etc)☆19Nov 24, 2022Updated 3 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- Standalone mini-app of the ECMWF cloud microphysics parameterization☆11Feb 24, 2026Updated 3 weeks ago
- A GPU performance profiling tool for PyTorch models☆510Jul 13, 2021Updated 4 years ago
- Sequence-level 1F1B schedule for LLMs.☆38Aug 26, 2025Updated 6 months ago
- MONeT framework for reducing memory consumption of DNN training☆174May 4, 2021Updated 4 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Apr 2, 2025Updated 11 months ago
- A tensor-aware point-to-point communication primitive for machine learning☆284Dec 17, 2025Updated 3 months ago
- Large scale graph learning on a single machine.☆167Feb 25, 2025Updated last year
- a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.☆1,542Jul 18, 2025Updated 8 months ago
- PyTorch extensions for high performance and large scale training.☆3,404Apr 26, 2025Updated 10 months ago
- ☆13Nov 25, 2022Updated 3 years ago
- paper and code for New Directions in Cloud Programming, CIDR 2021☆11Feb 17, 2021Updated 5 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,003Sep 19, 2024Updated last year
- Benchmark scripts for TVM☆74Mar 15, 2022Updated 4 years ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆477Mar 15, 2024Updated 2 years ago
- ☆10Apr 29, 2023Updated 2 years ago