A framework that helps implementing swizzle GPU kernels
☆50Feb 29, 2020Updated 6 years ago
Alternatives and similar repositories for swizzle-inventor
Users that are interested in swizzle-inventor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Dec 19, 2025Updated 4 months ago
- A retargetable and extensible synthesis-based compiler for modern hardware architectures☆17Nov 20, 2025Updated 4 months ago
- A basic Docker-based installation of TVM☆11Jun 23, 2022Updated 3 years ago
- CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark☆34Apr 9, 2026Updated last week
- Visualize TVM Relay program graph☆12Nov 19, 2019Updated 6 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A tool for checking tool output inspired by LLVM's FileCheck☆13Aug 29, 2025Updated 7 months ago
- SIMD recipes, for various platforms (collection of code snippets)☆49Jun 3, 2021Updated 4 years ago
- ☆38Jul 19, 2025Updated 9 months ago
- A C compiler with SSA-based backend optimzation☆15Mar 19, 2016Updated 10 years ago
- ☆27Oct 26, 2019Updated 6 years ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆16Mar 19, 2023Updated 3 years ago
- Flexible GPGPU instrumentation☆89Oct 10, 2019Updated 6 years ago
- Artifact for IPDPS'21: DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions.☆13Apr 6, 2021Updated 5 years ago
- ☆55Nov 21, 2019Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Reticle evaluation (PLDI 2021)☆12Apr 12, 2021Updated 5 years ago
- Experimental - join the Nerves discord if interested☆16Updated this week
- A Coq framework to support structural design and proof of hardware cache-coherence protocols☆14May 7, 2022Updated 3 years ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆133Jul 6, 2023Updated 2 years ago
- 支持GPU全链路加速的全同态加密(FHE)框架☆21Apr 18, 2025Updated last year
- This repository moved to https://github.com/elm-community/graph☆16Feb 22, 2023Updated 3 years ago
- Custom extensions to the RISC-V isa simulator for the UCB-BAR ESP project☆17Nov 27, 2022Updated 3 years ago
- An experimental ahead of time compiler for Relay.☆49Apr 21, 2020Updated 5 years ago
- GPU model checker☆13Apr 17, 2019Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆17Oct 1, 2015Updated 10 years ago
- Numpy-like encrypted matrix arithmetic library based on OpenFHE☆29Apr 10, 2026Updated last week
- Automatically exported from code.google.com/p/llvm-qemu☆11Apr 29, 2015Updated 10 years ago
- CUDAAdvisor: a GPU profiling tool☆53Aug 24, 2018Updated 7 years ago
- Cray Chapel scheduler for Apache Mesos☆22Mar 3, 2014Updated 12 years ago
- parser script to process pytorch autograd profiler result, convert json file to excel.☆15Oct 8, 2019Updated 6 years ago
- Assembler for NVIDIA Maxwell architecture☆1,061Jan 3, 2023Updated 3 years ago
- ☆15Mar 6, 2021Updated 5 years ago
- Source code of the simulator used in the Mosaic paper from MICRO 2017: "Mosaic: A GPU Memory Manager with Application-Transparent Support…☆50Aug 21, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Utilities for paper writing.☆12Jan 11, 2026Updated 3 months ago
- A framework for pipelined computing on GPU☆30Jul 17, 2019Updated 6 years ago
- Benchmark PyTorch Custom Operators☆14Jul 6, 2023Updated 2 years ago
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆12Apr 17, 2023Updated 3 years ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Jul 23, 2024Updated last year
- Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.☆23Aug 21, 2020Updated 5 years ago
- Verilog AST☆21Dec 2, 2023Updated 2 years ago