pytorch / tlparseLinks

TORCH_LOGS parser for PT2

☆59

Alternatives and similar repositories for tlparse

Users that are interested in tlparse are comparing it to the libraries listed below

Sorting:

cchan / tccl
extensible collectives library in triton
☆87Updated 5 months ago
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆45Updated 3 weeks ago
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆299Updated this week
openxla / shardy
MLIR-based partitioning system
☆125Updated this week
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer Generator(WIP) for Triton Kernels
☆150Updated this week
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆76Updated last week
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆221Updated this week
Jokeren / triton-samples
☆28Updated 8 months ago
albanD / subclass_zoo
☆176Updated last year
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆28Updated 3 weeks ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆97Updated 2 months ago
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆58Updated 2 weeks ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆154Updated 5 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆234Updated this week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆81Updated 2 weeks ago
triton-lang / kernels
☆88Updated 10 months ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago
jax-ml / ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
☆296Updated last week
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆354Updated this week
pytorch / rfcs
PyTorch RFCs (experimental)
☆135Updated 3 months ago
iree-org / iree-nvgpu
☆50Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
simveit / effective_transpose
Effective transpose on Hopper GPU
☆23Updated last week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆295Updated 3 weeks ago
deepspeedai / DeepSpeed-Kernels
☆74Updated 5 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆55Updated last week
NVIDIA / tilus
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆342Updated this week
NVIDIA / jaxpp
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
☆53Updated last month
andylolu2 / simpleGEMM
The simplest but fast implementation of matrix multiplication in CUDA.
☆38Updated last year
jansel / pytorch-jit-paritybench
☆40Updated 9 months ago