ademeure/QuickRunCUDA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ademeure/QuickRunCUDA)

ademeure / QuickRunCUDA

☆15Updated this week

Alternatives and similar repositories for QuickRunCUDA

Users that are interested in QuickRunCUDA are comparing it to the libraries listed below

Sorting:

cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated 7 months ago
HuyNguyen-hust / hopper-gemm-101
View on GitHub
☆11Dec 22, 2024Updated last year
huggingface / hf-rocm-kernels
View on GitHub
☆23Jul 11, 2025Updated 7 months ago
simveit / persistent_dense_gemm
View on GitHub
Persistent dense gemm for Hopper in `CuTeDSL`
☆15Aug 9, 2025Updated 6 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆18Nov 19, 2024Updated last year
ademeure / cuda-side-boost
View on GitHub
☆53Updated this week
FindHao / drgpu
View on GitHub
A Top-Down Profiler for GPU Applications
☆22Feb 29, 2024Updated 2 years ago
ihavnoid / tg4perfetto
View on GitHub
Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom …
☆25Jun 22, 2025Updated 8 months ago
aws-neuron / nki-library
View on GitHub
☆44Updated this week
yixiaoer / tpu-training-example
View on GitHub
☆16Jul 8, 2024Updated last year
Snektron / gpumode-amd-fp8-mm
View on GitHub
My submission for the GPUMODE/AMD fp8 mm challenge
☆29Jun 4, 2025Updated 8 months ago
thecharlieblake / lovely-llama
View on GitHub
An implementation of the Llama architecture, to instruct and delight
☆21May 31, 2025Updated 9 months ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated 9 months ago
muriloboratto / NVSHEMEM
View on GitHub
Sample Codes using NVSHMEM on Multi-GPU
☆30Jan 22, 2023Updated 3 years ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated 10 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆35Jul 29, 2025Updated 7 months ago
feifeibear / DPSKV3MFU
View on GitHub
Estimate MFU for DeepSeekV3
☆26Jan 5, 2025Updated last year
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆221Jan 20, 2026Updated last month
TIGER-AI-Lab / Context-Forcing
View on GitHub
Consistent Autoregressive Video Generation with Long Context
☆67Feb 6, 2026Updated 3 weeks ago
chengzeyi / piflux
View on GitHub
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆18Nov 18, 2024Updated last year
Karbo123 / pytorch_grouped_gemm
View on GitHub
High Performance Grouped GEMM in PyTorch
☆31May 10, 2022Updated 3 years ago
yixiaoer / mistral-v0.2-jax
View on GitHub
JAX implementation of the Mistral 7b v0.2 model
☆35Jul 3, 2024Updated last year
tilde-research / nsa-impl
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆129Jun 24, 2025Updated 8 months ago
NYCU-AI-EDA / Netlistify
View on GitHub
☆27Dec 3, 2025Updated 2 months ago
catqaq / NLP-Notes
View on GitHub
详细双语注释版word2vec源码，well-annotated word2vec
☆10Oct 3, 2021Updated 4 years ago
pclubiitk / valentine
View on GitHub
Valentine's Day Anonymous matching
☆10Jul 25, 2014Updated 11 years ago
flashinfer-ai / flashinfer-bench
View on GitHub
Building the Virtuous Cycle for AI-driven LLM Systems
☆186Feb 19, 2026Updated last week
replicate / go
View on GitHub
Repository for go shared libraries (for now).
☆11Dec 1, 2025Updated 3 months ago
NVIDIA / ib-traffic-monitor
View on GitHub
A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node
☆62Dec 19, 2025Updated 2 months ago
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆338Updated this week
cchan / tccl
View on GitHub
extensible collectives library in triton
☆95Mar 31, 2025Updated 11 months ago
shreyansh26 / Attention-Mask-Patterns
View on GitHub
Using FlexAttention to compute attention with different masking patterns
☆47Sep 22, 2024Updated last year
MeshInfra / WaferLLM
View on GitHub
WaferLLM: Large Language Model Inference at Wafer Scale
☆90Jan 7, 2026Updated last month
nanomaoli / llm_reproducibility
View on GitHub
☆79Feb 10, 2026Updated 2 weeks ago
meta-pytorch / kraken
View on GitHub
Triton-based Symmetric Memory operators and examples
☆85Jan 15, 2026Updated last month
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆91Updated this week
rndsrc / orbits-py
View on GitHub
Speeding Up Your Python Codes 1000x
☆12Apr 2, 2025Updated 10 months ago
mikaylagawarecki / transformer_tutorial_accompaniment
View on GitHub
☆20Oct 4, 2024Updated last year
supersymmetry-technologies / BigBang-Proton
View on GitHub
BigBang-Proton is a LLM pretrained on cross-scale, cross-structure, cross-discipline real-world scientific tasks to construct a scienti…
☆22Nov 8, 2025Updated 3 months ago