arakhmati / torchtrailLinks

torchtrail: trace the graph of torch functions and modules for visualization, reports, etc

☆25

Alternatives and similar repositories for torchtrail

Users that are interested in torchtrail are comparing it to the libraries listed below

Sorting:

graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆56Updated this week
UmerHA / triton_util
Make triton easier
☆47Updated last year
ezyang / torchdbg
PyTorch centric eager mode debugger
☆47Updated 7 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
lianakoleva / no-libtorch-compile
☆21Updated 4 months ago
main-horse / hnet
H-Net Dynamic Hierarchical Architecture
☆22Updated this week
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated 8 months ago
HazyResearch / train-tk
train with kittens!
☆61Updated 8 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆68Updated 2 months ago
Jokeren / triton-samples
☆28Updated 6 months ago
cray-lm / cray-lm
Cray-LM unified training and inference stack.
☆22Updated 5 months ago
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
graphcore-research / unit-scaling-demo
Unit Scaling demo and experimentation code
☆16Updated last year
google-deepmind / asyncdiloco
☆45Updated last year
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆70Updated last year
CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆64Updated 5 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆97Updated 5 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 9 months ago
kuterd / opal_ptx
Experimental GPU language with meta-programming
☆23Updated 10 months ago
xjdr-alt / muzero_sketch
☆38Updated 11 months ago
Z-T-WANG / LaProp-Optimizer
Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"
☆29Updated 4 years ago
DS3Lab / CocktailSGD
☆27Updated last year
stas00 / python-tools
Python tools
☆12Updated last year
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 5 months ago
koyeb / tenstorrent-examples
☆13Updated last month
cchan / tccl
extensible collectives library in triton
☆87Updated 3 months ago