Ratbuyer/h100-features

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ratbuyer/h100-features)

Ratbuyer / h100-features

☆18

Alternatives and similar repositories for h100-features

Users that are interested in h100-features are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
BakrN / vortex
View on GitHub
☆19Mar 8, 2025Updated last year
FdyCN / PTX-ISA
View on GitHub
CUDA PTX-ISA Document 中文翻译版
☆56Sep 29, 2025Updated 9 months ago
romnn / microgpusim
View on GitHub
Cycle-level, trace-driven, parallel GPU simulator for NVIDIA Pascal.
☆16Dec 13, 2025Updated 7 months ago
SpRegTiling / sparse-register-tiling
View on GitHub
☆10Mar 2, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenCilk / cilkrts
View on GitHub
A copy of the Intel Cilk Plus runtime system with modifications to work with OpenCilk and its associated tools.
☆12Jan 20, 2021Updated 5 years ago
ggsharma / microgradpp
View on GitHub
A header-only C++ autograd engine and neural network library inspired by Karpathy's micrograd. Learn backpropagation in modern C++17.
☆16Jan 14, 2026Updated 6 months ago
PerfVec / PerfVec
View on GitHub
A generalizable machine learning-based performance modeling framework.
☆20Jun 9, 2025Updated last year
zartbot / zadns
View on GitHub
☆32Sep 12, 2021Updated 4 years ago
sailfish009 / hqemu
View on GitHub
HQEMU v2.5.1 is a retargetable and multi-threaded dynamic binary translator on multicores
☆25Mar 21, 2018Updated 8 years ago
Bruce-Lee-LY / cuda_back2back_hgemm
View on GitHub
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆13Nov 3, 2023Updated 2 years ago
codyjrivera / tsm2x-imp
View on GitHub
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆35Jul 28, 2020Updated 5 years ago
georgia-tech-synergy-lab / hardtaco-hls
View on GitHub
HLS project modeling various sparse accelerators.
☆12Jan 11, 2022Updated 4 years ago
BBuf / KDA-Pilot
View on GitHub
☆231Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
robcasloz / llvm-discovery
View on GitHub
Discovery of Structured Parallelism In Sequential and Parallel Code
☆10Feb 13, 2021Updated 5 years ago
zartbot / shallowsim
View on GitHub
DeepSeek-V3/R1 inference performance simulator
☆194Mar 27, 2025Updated last year
pranjalssh / fast.cu
View on GitHub
Fastest kernels written from scratch
☆583Sep 18, 2025Updated 10 months ago
bsc-mem / Mess-simulator
View on GitHub
A fast, accurate, and easy-to-integrate memory simulator that model memory system performance with bandwidth--latency curves.
☆32Apr 29, 2026Updated 2 months ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
taraeicher / PerceptronBranchPredictor
View on GitHub
Perceptron-based branch predictor written in C++
☆14Dec 14, 2016Updated 9 years ago
ueqri / vis4mesh
View on GitHub
Visualization tool for designing mesh Network-on-Chips (NoC) and assisting with architecture research
☆17Jan 21, 2024Updated 2 years ago
doingself / ARKitApp
View on GitHub
arkit demo
☆11Aug 20, 2018Updated 7 years ago
hkust-adsl / gass
View on GitHub
☆43Apr 3, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ConvolutedDog / gpgpu-sim-comments
View on GitHub
GPGPU-Sim 中文注释版代码，包含 GPGPU-Sim 模拟器的最新版代码，经过中文注释，以帮助中文用户更好地理解和使用该模拟器。
☆30Dec 18, 2024Updated last year
EngAhmed21 / RISC-V-Processor-with-Pipelining
View on GitHub
Implementation of the pipelined RISC V processor with many useful features as fully bypassing, dynamic branch prediction, single and mult…
☆18Feb 12, 2024Updated 2 years ago
minhhn2910 / cuda-half2
View on GitHub
Convert CUDA programs from float data type to half or half2 with SIMDization
☆19May 28, 2019Updated 7 years ago
bsc-mem / Mess-benchmark
View on GitHub
A Multiplatform benchmark designed to provide holistic, detailed and close-to-hardware view of memory system performance with family of b…
☆46Oct 15, 2025Updated 9 months ago
ryanmacdonald / Ray-Tracing-GPU
View on GitHub
RTL implementation of a ray-tracing GPU
☆16Dec 18, 2012Updated 13 years ago
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
zhaosiying12138 / PDG_demo
View on GitHub
A toy implementation about Program Dependence Graph using LLVM
☆13Sep 27, 2023Updated 2 years ago
vancemiller / CUDA-preemption
View on GitHub
Experiments evaluating preemption on the NVIDIA Pascal architecture
☆16Nov 10, 2016Updated 9 years ago
hummingtree / cuda-graph-with-dynamic-parameters
View on GitHub
☆17Aug 9, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆556Sep 8, 2024Updated last year
jaspreetsingh009 / ADC_LCD_FPGA
View on GitHub
ADC & LCD Interfacing using Verilog & VHDL
☆12Feb 27, 2017Updated 9 years ago
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Updated this week
masahi / tvm-cutlass-eval
View on GitHub
☆41Mar 31, 2022Updated 4 years ago
Bruce-Lee-LY / matrix_multiply
View on GitHub
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
☆14Feb 8, 2023Updated 3 years ago
RRZE-HPC / gpu-benches
View on GitHub
collection of benchmarks to measure basic GPU capabilities
☆530Oct 24, 2025Updated 8 months ago
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 4 months ago