kshitij12345 / torchnnprofilerLinks

Context Manager to profile the forward and backward times of PyTorch's nn.Module

☆83

Alternatives and similar repositories for torchnnprofiler

Users that are interested in torchnnprofiler are comparing it to the libraries listed below

Sorting:

pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated last month
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆59Updated last week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
lernapparat / torchhacks
Hacks for PyTorch
☆19Updated 2 years ago
MathInf / toroidal
a lightweight transformer library for PyTorch
☆72Updated 3 years ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆44Updated 2 years ago
ezyang / torchdbg
PyTorch centric eager mode debugger
☆47Updated 7 months ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated 3 weeks ago
fidelity / stoke
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-…
☆67Updated 2 years ago
lessw2020 / transformer_central
Various transformers for FSDP research
☆37Updated 2 years ago
pytorch-labs / torchfix
TorchFix - a linter for PyTorch-using code with autofix support
☆145Updated 5 months ago
jiaweizzhao / ZerO-initialization
☆74Updated 2 years ago
rasbt / cyclemoid-pytorch
Cyclemoid implementation for PyTorch
☆90Updated 3 years ago
AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆184Updated 2 years ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆71Updated last year
stas00 / ml-ways
ML/DL Math and Method notes
☆62Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆128Updated 3 weeks ago
pytorch / rfcs
PyTorch RFCs (experimental)
☆134Updated 2 months ago
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆102Updated last year
rasbt / faster-pytorch-blog
Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy
☆128Updated 2 years ago
cloneofsimo / min-fsdp
☆83Updated last year
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 3 years ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆56Updated 3 years ago
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆215Updated last year
lianakoleva / no-libtorch-compile
☆21Updated 5 months ago