spcl / substationLinks

Research and development for optimizing transformers

☆131

Alternatives and similar repositories for substation

Users that are interested in substation are comparing it to the libraries listed below

Sorting:

parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆136Updated 3 years ago
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆43Updated 2 years ago
awslabs / slapo
A schedule language for large model training
☆151Updated 2 months ago
zhuohan123 / terapipe
☆75Updated 4 years ago
microsoft / varuna
☆252Updated last year
ptillet / torch-blocksparse
Block-sparse primitives for PyTorch
☆160Updated 4 years ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated last month
awslabs / raf
☆145Updated 8 months ago
stanford-futuredata / stk
☆112Updated last year
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆120Updated 10 months ago
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆275Updated 4 years ago
petuum / autodist
Simple Distributed Deep Learning on TensorFlow
☆134Updated 4 months ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆41Updated 2 years ago
HPDL-Group / Merak
☆81Updated 5 months ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 4 years ago
awslabs / lorien
☆42Updated 2 years ago
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆273Updated 2 months ago
cmu-catalyst / collage
System for automated integration of deep learning backends.
☆47Updated 3 years ago
NVIDIA / LDDL
Distributed preprocessing and data loading for language datasets
☆39Updated last year
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆86Updated this week
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated 2 years ago
microsoft / SparTA
☆153Updated last year
albanD / subclass_zoo
☆178Updated last year
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
cchan / tccl
extensible collectives library in triton
☆89Updated 6 months ago
YulhwaKim / cutlass_tilesparse
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
☆50Updated 7 years ago
marsupialtail / sparsednn
Fast sparse deep learning on CPUs
☆56Updated 3 years ago
uwsampl / dtr-prototype
Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616
☆132Updated 2 years ago
pytorch / rfcs
PyTorch RFCs (experimental)
☆135Updated 4 months ago
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆138Updated 2 years ago