gnovack / distributed-training-and-deepspeedLinks

☆17

Alternatives and similar repositories for distributed-training-and-deepspeed

Users that are interested in distributed-training-and-deepspeed are comparing it to the libraries listed below

Sorting:

mgmalek / efficient_cross_entropy
☆121Updated last year
lessw2020 / transformer_central
Various transformers for FSDP research
☆38Updated 3 years ago
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆146Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated this week
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆98Updated 2 years ago
hamelsmu / llama-inference
experiments with inference on llama
☆103Updated last year
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated last week
llm-efficiency-challenge / neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
☆257Updated 2 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year
kshitij12345 / torchnnprofiler
Context Manager to profile the forward and backward times of PyTorch's nn.Module
☆83Updated 2 years ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated this week
jxmorris12 / bm25_pt
minimal pytorch implementation of bm25 (with sparse tensors)
☆104Updated last month
Jokeren / triton-samples
☆28Updated 10 months ago
microsoft / varuna
☆252Updated last year
sgugger / torchdynamo-tests
☆19Updated 3 years ago
stas00 / ml-ways
ML/DL Math and Method notes
☆64Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆151Updated 2 years ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 5 months ago
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆111Updated last year
gpu-mode / profiling-cuda-in-torch
☆177Updated last year
TobiasNorlund / retro
Official repo to On the Generalization Ability of Retrieval-Enhanced Transformers
☆44Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated this week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆307Updated 3 months ago
cloneofsimo / min-fsdp
☆91Updated last year