A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code
☆15Mar 19, 2023Updated 2 years ago
Alternatives and similar repositories for ptxas-wrapper
Users that are interested in ptxas-wrapper are comparing it to the libraries listed below
Sorting:
- PIRA - Automatic Instrumentation Refinement☆16Mar 28, 2024Updated last year
- ☆14Sep 7, 2023Updated 2 years ago
- A PyTorch native platform for training generative AI models☆15Nov 18, 2025Updated 3 months ago
- LLVM Plugin to Instrument Global Memory Accesses in CUDA Kernels☆10Jun 8, 2020Updated 5 years ago
- ☆11Aug 10, 2021Updated 4 years ago
- ☆12Apr 1, 2025Updated 11 months ago
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆16Apr 18, 2025Updated 10 months ago
- ☆55Nov 21, 2019Updated 6 years ago
- This is the open source version of HPL-MXP. The code performance has been verified on Frontier☆18Jul 9, 2025Updated 7 months ago
- ☆17Nov 11, 2025Updated 3 months ago
- RISC-V vector extension ISA simulation☆16Jun 11, 2019Updated 6 years ago
- Tristan-MP v2 [public]☆18Dec 29, 2024Updated last year
- DXT Explorer is an interactive web-based log analysis tool for Darshan DXT logs.☆17Feb 19, 2026Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆218Feb 9, 2025Updated last year
- Benchmarks☆17Apr 28, 2025Updated 10 months ago
- Time Ordered Astrophysics Scalable Tools☆44Updated this week
- A GPU benchmark suite for autotuners☆19Feb 20, 2024Updated 2 years ago
- Materials to teach terminal fundamentals for HPC users☆19Aug 18, 2021Updated 4 years ago
- ☆18Jan 17, 2024Updated 2 years ago
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆70Feb 18, 2026Updated last week
- Scientific Machine Learning Tutorials☆40Nov 20, 2021Updated 4 years ago
- This is the public repo for the MLPerf DeepCAM climate data segmentation proposal.☆16Sep 30, 2025Updated 5 months ago
- OMP4Py: a native Python implementation of OpenMP☆28Updated this week
- single-GPU to multi-GPU training of PyTorch apps at NERSC☆22Apr 10, 2024Updated last year
- MPI Benchmark on AWS HPC cluster☆20Jan 31, 2020Updated 6 years ago
- Vectorised data model base and helper classes.☆20Updated this week
- ☆23Feb 17, 2026Updated last week
- Drishti provides I/O insights to help you improve your application's I/O performance.☆23Feb 18, 2026Updated last week
- Memory consistency modelling using Alloy☆31Dec 16, 2020Updated 5 years ago
- Process Orchestration Framework: A camunda 7 fork☆21Feb 23, 2026Updated last week
- PIM-ML is a benchmark for training machine learning algorithms on the UPMEM architecture, which is the first publicly-available real-worl…☆25Jan 7, 2025Updated last year
- Source code for "BenchPress: A Deep Active Benchmark Generator", PACT 2022☆21Mar 15, 2023Updated 2 years ago
- COCCL: Compression and precision co-aware collective communication library☆30Mar 16, 2025Updated 11 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆26Updated this week
- Benchmark implementation of CosmoFlow in TensorFlow Keras☆22Feb 7, 2024Updated 2 years ago
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆67Dec 10, 2025Updated 2 months ago
- An intelligent agent that understands your code and crafts perfect Git artifacts☆32Updated this week
- ☆68May 29, 2019Updated 6 years ago
- Comb is a communication performance benchmarking tool.☆26Feb 27, 2023Updated 3 years ago