arakhmati / torchtrail
torchtrail: trace the graph of torch functions and modules for visualization, reports, etc
☆25Updated 10 months ago
Alternatives and similar repositories for torchtrail:
Users that are interested in torchtrail are comparing it to the libraries listed below
- Attention in SRAM on Tenstorrent Grayskull☆34Updated 9 months ago
- extensible collectives library in triton☆85Updated 3 weeks ago
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆829Updated this week
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- ☆21Updated last month
- ☆13Updated last month
- Cray-LM unified training and inference stack.☆22Updated 2 months ago
- ☆200Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 9 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- Ahead of Time (AOT) Triton Math Library☆57Updated last week
- ☆27Updated 3 months ago
- ☆103Updated 8 months ago
- Custom kernels in Triton language for accelerating LLMs☆18Updated last year
- Experiment of using Tangent to autodiff triton☆78Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆106Updated 9 months ago
- ☆16Updated 7 months ago
- ☆198Updated 9 months ago
- Fastest kernels written from scratch☆236Updated 3 weeks ago
- ☆87Updated last year
- Applied AI experiments and examples for PyTorch☆262Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- Make triton easier☆47Updated 10 months ago
- Tenstorrent MLIR compiler☆120Updated this week
- ☆31Updated this week
- ☆78Updated 5 months ago
- ⭐️ TTNN Compiler for PyTorch 2 ⭐️ It enables running PyTorch models on Tenstorrent hardware using torch.compile path☆36Updated this week
- a minimal cache manager for PagedAttention, on top of llama3.☆83Updated 8 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆165Updated last month