xuqifan897 / OptimusLinks

☆28

Alternatives and similar repositories for Optimus

Users that are interested in Optimus are comparing it to the libraries listed below

Sorting:

zhuohan123 / terapipe
☆77Updated 4 years ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆68Updated 8 months ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆43Updated 3 years ago
HPDL-Group / Merak
☆80Updated 6 months ago
awslabs / raf
☆145Updated 10 months ago
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆43Updated 2 years ago
spcl / substation
Research and development for optimizing transformers
☆131Updated 4 years ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆224Updated 2 years ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆122Updated last year
stanford-futuredata / stk
☆113Updated last year
awslabs / lorien
☆42Updated 2 years ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆218Updated last year
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆80Updated last year
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆169Updated last month
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆98Updated last week
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆120Updated 2 months ago
awslabs / slapo
A schedule language for large model training
☆151Updated 3 months ago
AlibabaPAI / FLASHNN
☆102Updated last year
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆137Updated 3 years ago
parasailteam / coconet
☆83Updated 2 years ago
cmu-catalyst / collage
System for automated integration of deep learning backends.
☆47Updated 3 years ago
thu-pacman / FasterMoE
☆88Updated 3 years ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆272Updated 4 months ago
RulinShao / FastCkpt
Python package for rematerialization-aware gradient checkpointing
☆26Updated 2 years ago
uwsampl / dtr-prototype
Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616
☆132Updated 2 years ago
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆121Updated 3 years ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆123Updated last year
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year