chenyu-jiang / nsys2json
A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.
☆22Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for nsys2json
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆43Updated 11 months ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆156Updated this week
- ☆80Updated 7 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆85Updated 8 months ago
- DietCode Code Release☆61Updated 2 years ago
- ☆48Updated this week
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆116Updated 4 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆47Updated 7 months ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆29Updated 3 months ago
- ☆38Updated 4 years ago
- ☆20Updated 2 years ago
- play gemm with tvm☆84Updated last year
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆103Updated 2 years ago
- ☆128Updated this week
- Benchmark scripts for TVM☆73Updated 2 years ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated this week
- A language and compiler for irregular tensor programs.☆133Updated 6 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆55Updated this week
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆131Updated last year
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆57Updated 6 months ago
- Experimental projects related to TensorRT☆81Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- ☆73Updated last year
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆81Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆30Updated 3 months ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆114Updated 2 years ago
- Optimize GEMM with tensorcore step by step☆15Updated 11 months ago
- Thunder Research Group's Collective Communication Library☆26Updated 6 months ago