gpauloski / BERT-PyTorchLinks

BERT for Distributed PyTorch + AMP Training

☆12

Alternatives and similar repositories for BERT-PyTorch

Users that are interested in BERT-PyTorch are comparing it to the libraries listed below

Sorting:

Peidong-Wang / Distributed-TensorFlow-Using-MPI
Template for Deploying Distributed TensorFlow on Clusters Using MPI
☆15Updated 5 years ago
nvidia-china-sae / WholeGraph
☆11Updated 4 years ago
NVIDIA / LDDL
Distributed preprocessing and data loading for language datasets
☆39Updated last year
hpcaitech / ColossalAI-Benchmark
Performance benchmarking with ColossalAI
☆39Updated 2 years ago
intel / optimized-models
☆26Updated 2 years ago
HicrestLaboratory / SPARTA
SParse AcceleRation on Tensor Architecture
☆17Updated 2 months ago
DmitryLyakh / CUDA_Tutorial
☆23Updated 5 years ago
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
The-AI-Summer / pytorch-ddp
code for the ddp tutorial
☆32Updated 3 years ago
usyd-fsalab / NeuralNetworkRandomness
☆14Updated 3 years ago
L1aoXingyu / llm-infer-bench
☆11Updated last year
yuanwei2019 / EAdam-optimizer
Some improvements on Adam
☆28Updated 4 years ago
af-ayala / heffte
Highly Efficient FFT for Exascale
☆38Updated last year
mcarilli / mixed_precision_references
Personal collection of references for high performance mixed precision training.
☆41Updated 5 years ago
HabanaAI / Megatron-DeepSpeed
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆13Updated 5 months ago
NVIDIA / dllogger
A logging tool for deep learning.
☆58Updated 2 months ago
xuqifan897 / Optimus
☆27Updated 3 years ago
NERSC / sc21-dl-tutorial
Material for the SC21 Deep Learning at Scale Tutorial
☆25Updated 2 years ago
HabanaAI / Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
☆161Updated 2 weeks ago
mlcommons / training_results_v0.7
This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.
☆56Updated 2 years ago
graphcore / tutorials
Training material for IPU users: tutorials, feature examples, simple applications
☆86Updated 2 years ago
CHARM-Tx / linear_mem_attention_pytorch
Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch
☆12Updated 3 years ago
instance-wise-ordered-transformer / IOT
☆20Updated 4 years ago
RUCAIBox / MPOP
☆13Updated 3 years ago
mlcommons / training_results_v2.1
This repository contains the results and code for the MLPerf™ Training v2.1 benchmark.
☆15Updated last year
LeeJuly30 / BERTCpp
implement bert in pure c++
☆36Updated 5 years ago
reger-men / HPL_GPU
High-Performance Linpack Benchmark adopted version for GPU backend
☆11Updated 2 years ago
mlcommons / training_results_v4.0
This repository contains the results and code for the MLPerf™ Training v4.0 benchmark.
☆12Updated 11 months ago
ROCm / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆22Updated this week
Harry-Chen / InfMoE
Inference framework for MoE layers based on TensorRT with Python binding
☆41Updated 4 years ago