arakhmati / torchtrailLinks
torchtrail: trace the graph of torch functions and modules for visualization, reports, etc
☆25Updated 2 weeks ago
Alternatives and similar repositories for torchtrail
Users that are interested in torchtrail are comparing it to the libraries listed below
Sorting:
- Make triton easier☆47Updated 11 months ago
- Tenstorrent MLIR compiler☆132Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆60Updated this week
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ It enables running PyTorch models on Tenstorrent hardware using torch.compile path☆45Updated this week
- A place to store reusable transformer components of my own creation or found on the interwebs☆56Updated 3 weeks ago
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆898Updated this week
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- extensible collectives library in triton☆87Updated 2 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆61Updated 4 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆86Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆36Updated 10 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 10 months ago
- ☆215Updated this week
- A bunch of kernels that might make stuff slower 😉☆48Updated this week
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆46Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆133Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆67Updated 2 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆153Updated this week
- The simplest but fast implementation of matrix multiplication in CUDA.☆35Updated 10 months ago
- ☆10Updated last week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- Samples of good AI generated CUDA kernels☆73Updated last week
- train with kittens!☆59Updated 7 months ago
- Frontend integration for PyTorch with tt-mlir☆21Updated this week
- ☆38Updated 10 months ago
- ☆107Updated 2 months ago
- Cray-LM unified training and inference stack.☆22Updated 4 months ago
- ☆88Updated last year