JohndeVostok/APE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JohndeVostok/APE)

JohndeVostok / APE

A GPU FP32 computation method with Tensor Cores.

☆27

Alternatives and similar repositories for APE

Users that are interested in APE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ariasanovsky / ptx-parser
View on GitHub
☆11Jun 9, 2023Updated 3 years ago
howardlau1999 / hcache-uring
View on GitHub
2022 ECS CloudBuild Distributed Cache Contest - Final Round https://tianchi.aliyun.com/competition/entrance/531982/introduction
☆17Dec 8, 2022Updated 3 years ago
Cytosine2020 / crust
View on GitHub
A Rust style C++ library.
☆19Sep 3, 2022Updated 3 years ago
getianao / ngAP
View on GitHub
ngAP's artifact for ASPLOS'24
☆25Jul 29, 2025Updated 11 months ago
regehr / pldi22-llvm-tutorial
View on GitHub
outline and links for PLDI 2022 tutorial
☆17Jun 13, 2022Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
yuyangJin / PerFlow-AI
View on GitHub
PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.
☆33May 12, 2026Updated 2 months ago
shen203 / GPU_Microbenchmark
View on GitHub
☆25Jun 24, 2022Updated 4 years ago
eth-cscs / Tiled-MM
View on GitHub
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
☆33Apr 2, 2025Updated last year
Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
daadaada / gas
View on GitHub
☆49Dec 11, 2020Updated 5 years ago
Jokeren / GPA
View on GitHub
GPU Performance Advisor
☆66Jul 25, 2022Updated 4 years ago
thu-pacman / HyQuas
View on GitHub
A hybrid partitioner based quantum circuit simulation system on GPU
☆46Aug 17, 2022Updated 3 years ago
gty111 / SimpleUseGpgpuSim
View on GitHub
GPGPU-SIM 使用篇
☆14Nov 12, 2022Updated 3 years ago
roastduck / FreeTensor
View on GitHub
A language and compiler for irregular tensor programs.
☆152Jul 16, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / SwitchML
View on GitHub
Switch-based Training Acceleration for Machine Learning (SwitchML)
☆16Apr 13, 2021Updated 5 years ago
SYSU-SCC / sysu-scc-spack-repo
View on GitHub
Spack package repository maintained by Student Cluster Competition Team @ Sun Yat-sen University.
☆16Aug 20, 2025Updated 11 months ago
qianl15 / this
View on GitHub
Thousand Island Scanner: Scaling Video Analysis on AWS Lambda
☆13Oct 25, 2019Updated 6 years ago
oscomp / proj23-lightweight-hypervisor
View on GitHub
在RISC-V处理器上实现一个轻量级的Hypervisor。
☆12Dec 25, 2020Updated 5 years ago
Rivendile / Muri
View on GitHub
Artifacts for our SIGCOMM'22 paper Muri
☆44Dec 29, 2023Updated 2 years ago
Nelson-Cheung / yatsenos-riscv
View on GitHub
Rebuild YatSenOS On RISC-V 64.
☆23Jan 6, 2022Updated 4 years ago
caps-tum / mt4g
View on GitHub
Memory Topology for GPUs
☆19Updated this week
heheda12345 / MagPy
View on GitHub
☆41Jun 5, 2024Updated 2 years ago
Light-of-Hers / CCTV
View on GitHub
C++ Compile-Time eValuator for scheme
☆21Jun 29, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
pyxis-roc / ptxparser
View on GitHub
A parser for PTX 6.5
☆13Jun 19, 2023Updated 3 years ago
seb-v / amd_challenge_solutions
View on GitHub
☆19Jun 6, 2025Updated last year
srvm / cupti_profiler
View on GitHub
CUPTI GPU Profiler
☆39Feb 26, 2019Updated 7 years ago
systems-seminar-uiuc / systems-seminar-uiuc.github.io
View on GitHub
Website for Systems Research Seminar at UIUC
☆21May 7, 2026Updated 2 months ago
xnd-project / cuda-benchmarks
View on GitHub
Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.
☆21Oct 15, 2019Updated 6 years ago
jwnhy / coffer
View on GitHub
Coffer is a RISC-V trusted execution environment developed in Rust.
☆21Mar 3, 2022Updated 4 years ago
yuyangJin / PerFlow
View on GitHub
Domain-specific framework for performance analysis of parallel programs
☆25Mar 23, 2026Updated 4 months ago
project-flexos / asplos22-ae
View on GitHub
FlexOS: Towards Flexible OS Isolation (ASPLOS'22) Artifact Evaluation Repository
☆19Apr 2, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
efeslab / eecs582
View on GitHub
Course website for Advanced Operating Systems
☆13Apr 8, 2022Updated 4 years ago
KernelTuner / kernel_launcher
View on GitHub
Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner
☆22Sep 12, 2025Updated 10 months ago
Molecule-Serverless / molecule-artifact
View on GitHub
Molecule's artifact for ASPLOS'22
☆30Feb 16, 2022Updated 4 years ago
AMDResearch / intellikit
View on GitHub
IntelliKit is a collection of intelligent tools designed to make GPU kernel development, profiling, and validation accessible to LLMs and…
☆27Updated this week
racheesingh / cloud-te-tutorial
View on GitHub
☆15Aug 12, 2023Updated 2 years ago
OpenPPL / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆85Mar 20, 2023Updated 3 years ago
doingself / ARKitApp
View on GitHub
arkit demo
☆11Aug 20, 2018Updated 7 years ago