argonne-lcf / Megatron-DeepSpeedLinks

Ongoing research training transformer language models at scale, including: BERT & GPT-2

☆16

Alternatives and similar repositories for Megatron-DeepSpeed

Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below

Sorting:

NERSC / sc22-dl-tutorial
Material for the SC22 Deep Learning at Scale Tutorial
☆41Updated 2 years ago
coreyjadams / CosmicTagger
Cosmic Tagging Network for Neutrino Physics
☆13Updated last year
argonne-lcf / inference-endpoints
This is a repository with examples to run inference endpoints on various ALCF clusters
☆24Updated last week
NERSC / sc20-dl-tutorial
☆21Updated 4 years ago
NERSC / sc24-dl-tutorial
SC24 Deep Learning at Scale Tutorial Material
☆33Updated 6 months ago
intel / intel-extension-for-openxla
☆49Updated 2 months ago
argonne-lcf / GettingStarted
Collection of small examples for running on ALCF resources
☆19Updated 2 weeks ago
argonne-lcf / ALCFBeginnersGuide
☆43Updated 3 weeks ago
argonne-lcf / CompPerfWorkshop
ALCF Computational Performance Workshop
☆37Updated 2 years ago
mkietzm4n / really-fast-dijkstra
"wow, that is really fast." - Kyle Gerard Felker
☆9Updated 3 years ago
olcf / ai-training-series
AI Training Series Material
☆37Updated 10 months ago
sparticlesteve / cosmoflow-benchmark
Benchmark implementation of CosmoFlow in TensorFlow Keras
☆21Updated last year
axonn-ai / axonn
A parallel framework for training deep neural networks
☆63Updated 4 months ago
intel / torch-xpu-ops
☆50Updated this week
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆212Updated this week
hpdps-group / COCCL
COCCL: Compression and precision co-aware collective communication library
☆24Updated 4 months ago
spcl / sten
Sparsity support for PyTorch
☆36Updated 4 months ago
NERSC / nersc-dl-wandb
Guidelines on using Weights and Biases logging for deep learning applications on NERSC machines
☆13Updated 2 years ago
DataStates / datastates-llm
LLM checkpointing for DeepSpeed/Megatron
☆19Updated 3 weeks ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
Jokeren / triton-samples
☆28Updated 6 months ago
hariharan-devarajan / dlio_benchmark
This is repository for a I/O benchmark which represents Scientific Deep Learning Workloads.
☆23Updated 2 years ago
NVIDIA / jaxpp
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
☆52Updated last month
pmodels / yaksa
Yaksa: High-performance Noncontiguous Data Management
☆13Updated 10 months ago
saforem2 / ezpz
Train across all your devices, ezpz 🍋
☆23Updated this week
argonne-lcf / ALCF_Hands_on_HPC_Workshop
The ALCF hosts a regular simulation, data, and learning workshop to help users scale their applications. This repository contains the exa…
☆64Updated 9 months ago
pranjalssh / fast.cu
Fastest kernels written from scratch
☆310Updated 4 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆98Updated 6 months ago
PrincetonUniversity / gpu_programming_intro
☆131Updated 3 weeks ago