ademeure / QuickRunCUDALinks
☆13Updated 2 weeks ago
Alternatives and similar repositories for QuickRunCUDA
Users that are interested in QuickRunCUDA are comparing it to the libraries listed below
Sorting:
- ☆49Updated 6 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆100Updated 4 months ago
- ☆36Updated last week
- Framework to reduce autotune overhead to zero for well known deployments.☆85Updated last month
- extensible collectives library in triton☆91Updated 7 months ago
- DeeperGEMM: crazy optimized version☆73Updated 6 months ago
- ☆65Updated 6 months ago
- ☆31Updated 4 months ago
- An experimental communicating attention kernel based on DeepEP.☆34Updated 3 months ago
- ☆50Updated 5 months ago
- How to ensure correctness and ship LLM generated kernels in PyTorch☆117Updated this week
- ☆93Updated last year
- Triton-based Symmetric Memory operators and examples☆62Updated last month
- Benchmark tests supporting the TiledCUDA library.☆17Updated 11 months ago
- ☆63Updated last week
- Automatic differentiation for Triton Kernels☆30Updated 3 months ago
- Debug print operator for cudagraph debugging☆14Updated last year
- A bunch of kernels that might make stuff slower 😉☆64Updated last week
- ☆106Updated 5 months ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆75Updated last month
- Autonomous GPU Kernel Generation via Deep Agents☆123Updated this week
- ☆12Updated 10 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆104Updated last week
- Github mirror of trition-lang/triton repo.☆98Updated last week
- ☆71Updated 7 months ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆88Updated last week
- Example of applying CUDA graphs to LLaMA-v2☆12Updated 2 years ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆128Updated last week
- Tutorials for NVIDIA CUPTI samples☆38Updated 2 weeks ago
- Transformers components but in Triton☆34Updated 6 months ago