TORCH_TRACE parser for PT2
☆87May 11, 2026Updated last month
Alternatives and similar repositories for tlparse
Users that are interested in tlparse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PyTorch centric eager mode debugger☆48Dec 16, 2024Updated last year
- ☆21Mar 3, 2025Updated last year
- ICSE2021 Submission☆13Aug 28, 2022Updated 3 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆359Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆212Jun 10, 2026Updated 3 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.☆811Oct 13, 2025Updated 8 months ago
- extensible collectives library in triton☆98Mar 31, 2025Updated last year
- Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.☆120Apr 17, 2026Updated 2 months ago
- ☆20Sep 22, 2023Updated 2 years ago
- [DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning☆15Jan 13, 2024Updated 2 years ago
- Triton-based Symmetric Memory operators and examples☆103May 15, 2026Updated last month
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆114Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆514Jun 9, 2026Updated 3 weeks ago
- Shared Middle-Layer for Triton Compilation☆338Dec 5, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Artifacts of EVT ASPLOS'24☆30Mar 6, 2024Updated 2 years ago
- ☆170Dec 27, 2024Updated last year
- A library to analyze PyTorch traces.☆531May 29, 2026Updated last month
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago
- PPX for template strings☆14Nov 17, 2018Updated 7 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,336Updated this week
- PyTorch RFCs (experimental)☆147Mar 27, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Jun 15, 2026Updated 2 weeks ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆664Updated this week
- Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom …☆26Jun 22, 2025Updated last year
- FlagGems is an operator library for large language models implemented in the Triton Language.☆1,037Updated this week
- ☆20May 30, 2026Updated last month
- Samples of good AI generated CUDA kernels☆105May 30, 2025Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated last year
- Tensor Compute Primitives: Mid-level Intermediate Representation for Machine Learning Programs☆35Jan 30, 2025Updated last year
- Abstract BSP tree in Rust☆14Sep 3, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆190Jun 16, 2024Updated 2 years ago
- CUDA Template Functions☆20Dec 16, 2025Updated 6 months ago
- ☆91Jan 23, 2025Updated last year
- ☆13May 11, 2026Updated last month
- ☆146Aug 18, 2025Updated 10 months ago
- ☆19Jun 6, 2025Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆892Updated this week