QianyanTech/NBAssembler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QianyanTech/NBAssembler)

QianyanTech / NBAssembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

☆96

Alternatives and similar repositories for NBAssembler

Users that are interested in NBAssembler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆126Jul 11, 2022Updated 4 years ago
daadaada / turingas
View on GitHub
Assembler for NVIDIA Volta and Turing GPUs
☆246Jan 13, 2022Updated 4 years ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
View on GitHub
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆420Jan 2, 2025Updated last year
temporal-hpc / reduction-tensor-cores
View on GitHub
Fast GPU based tensor core reductions
☆12Jan 13, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
daadaada / gas
View on GitHub
☆49Dec 11, 2020Updated 5 years ago
Guangxuan-Xiao / SPMM-CUDA
View on GitHub
☆13Jun 23, 2022Updated 4 years ago
SuperScientificSoftwareLaboratory / TileSpMV
View on GitHub
Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…
☆13Aug 12, 2022Updated 3 years ago
hpc-ulisboa / gpupowermodel
View on GitHub
GPU Power Modelling Tool
☆14Nov 15, 2019Updated 6 years ago
microsoft / cusync
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
0xD0GF00D / DocumentSASS
View on GitHub
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆224Jul 18, 2025Updated last year
OpenPPL / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆85Mar 20, 2023Updated 3 years ago
nicolaswilde / cuda-tensorcore-hgemm
View on GitHub
☆160Dec 26, 2024Updated last year
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
north-numerical-computing / tensor-cores-numerical-behavior
View on GitHub
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆42Jul 24, 2024Updated 2 years ago
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
SJTU-ACA-Lab / blue-porcelain
View on GitHub
☆144May 23, 2024Updated 2 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
sunlex0717 / DissectingTensorCores
View on GitHub
☆114Apr 19, 2024Updated 2 years ago
THU-DSP-LAB / ventus-gpgpu
View on GitHub
GPGPU processor supporting RISCV-V extension, developed with Chisel HDL
☆932Jul 8, 2026Updated 2 weeks ago
hibagus / CUDA_Bench
View on GitHub
CUDA GPU Benchmark
☆38Jan 31, 2025Updated last year
VivekPanyam / cudaparsers
View on GitHub
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
microsoft / FractalTensor
View on GitHub
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆32Dec 21, 2024Updated last year
RRZE-HPC / gpu-benches
View on GitHub
collection of benchmarks to measure basic GPU capabilities
☆530Oct 24, 2025Updated 9 months ago
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
rvgpu / rvgpu
View on GitHub
☆20Nov 4, 2024Updated last year
tongzhou80 / nanoPyC
View on GitHub
☆69Mar 19, 2023Updated 3 years ago
NVlabs / mixedproxy
View on GitHub
☆15Nov 14, 2023Updated 2 years ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
PAA-NCIC / PPoPP2017_artifact
View on GitHub
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆86Oct 8, 2019Updated 6 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
owensgroup / merge-spmm
View on GitHub
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆74Oct 5, 2020Updated 5 years ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
dgSPARSE / dgSPARSE-Lib
View on GitHub
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity
☆122Jul 13, 2026Updated last week
AnonymousRepo123 / AlphaSparse
View on GitHub
A intelligent matrix format designer for SpMV
☆10Oct 10, 2023Updated 2 years ago
fishmingyu / qrv2-gpu-mode
View on GitHub
Batched square compact-Householder QR factorization.
☆14Jul 2, 2026Updated 3 weeks ago
gpgpu-sim / gpgpu-sim_distribution
View on GitHub
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for…
☆1,676Feb 15, 2025Updated last year
GVProf / GVProf
View on GitHub
GVProf: A Value Profiler for GPU-based Clusters
☆54Mar 24, 2024Updated 2 years ago