OpenPPL/CuAssembler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenPPL/CuAssembler)

OpenPPL / CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully ：）

☆85

Alternatives and similar repositories for CuAssembler

Users that are interested in CuAssembler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenPPL / hpcc
View on GitHub
CMake configurations for PPL projects
☆12Aug 10, 2024Updated last year
OpenPPL / ppl.common
View on GitHub
Common libraries for PPL projects
☆31Mar 10, 2025Updated last year
OpenPPL / ppl.nn.llm
View on GitHub
☆140Apr 23, 2024Updated 2 years ago
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
OpenPPL / ppl.kernel.cpu
View on GitHub
☆19Apr 6, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
daadaada / gas
View on GitHub
☆49Dec 11, 2020Updated 5 years ago
OpenPPL / ppl.llm.kernel.cuda
View on GitHub
☆150Jan 9, 2025Updated last year
pigirons / conv3x3_m1
View on GitHub
This is a demo how to write a high performance convolution run on apple silicon
☆56Feb 8, 2022Updated 4 years ago
OpenPPL / ppl.nn
View on GitHub
A primitive library for neural network
☆1,367Nov 24, 2024Updated last year
OpenPPL / ppl.llm.serving
View on GitHub
☆128Dec 24, 2024Updated last year
Bruce-Lee-LY / flash_attention_inference
View on GitHub
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆45Feb 27, 2025Updated last year
Alcanderian / CUDA-tutorial
View on GitHub
☆14Nov 2, 2018Updated 7 years ago
OpenPPL / ppl.cv
View on GitHub
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
☆515Oct 30, 2024Updated last year
gty111 / PTX-EMU
View on GitHub
PTX-EMU is a simple emulator for CUDA program.
☆40Apr 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NVlabs / NVBit
View on GitHub
☆341Apr 6, 2026Updated 3 months ago
OpenPPL / ppl.kernel.cuda
View on GitHub
☆38Oct 12, 2024Updated last year
OpenPPL / ppl.pmx
View on GitHub
☆61Nov 21, 2024Updated last year
QianyanTech / NBAssembler
View on GitHub
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆96Feb 23, 2023Updated 3 years ago
Oneflow-Inc / oneflow-xrt
View on GitHub
☆24Apr 25, 2023Updated 3 years ago
SJTU-IPADS / reef-artifacts
View on GitHub
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43May 29, 2022Updated 4 years ago
MARD1NO / CUDA-PPT
View on GitHub
☆136Apr 16, 2026Updated 3 months ago
JohndeVostok / APE
View on GitHub
A GPU FP32 computation method with Tensor Cores.
☆27Dec 8, 2025Updated 7 months ago
wu-kan / HPL-AI
View on GitHub
An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
☆30May 30, 2021Updated 5 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Oneflow-Inc / oneflow_face
View on GitHub
☆12Aug 10, 2022Updated 3 years ago
octoml / deformable-attention-kernel
View on GitHub
TVMScript kernel for deformable attention
☆25Dec 15, 2021Updated 4 years ago
NVIDIA / nvbench_demo
View on GitHub
Simple starter CMake project that uses NVBench.
☆15May 6, 2025Updated last year
TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
Jokeren / GPA
View on GitHub
GPU Performance Advisor
☆66Jul 25, 2022Updated 3 years ago
pigirons / cpufp
View on GitHub
A CPU tool for benchmarking the peak of floating points
☆586May 4, 2026Updated 2 months ago
xiuxiazhang / KeplerAs
View on GitHub
An Open Source Kepler GPU Assembler
☆22Jan 23, 2017Updated 9 years ago
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆125Jul 11, 2022Updated 4 years ago
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tpoisonooo / tengine-pipe
View on GitHub
Tengine 管子是用来快速生产 demo 的辅助工具
☆11Jul 15, 2021Updated 5 years ago
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
thustorage / GCR
View on GitHub
code repo for GCR [FAST'26]
☆16Mar 3, 2026Updated 4 months ago
pyxis-roc / ptxparser
View on GitHub
A parser for PTX 6.5
☆13Jun 19, 2023Updated 3 years ago
alibaba / BladeDISC
View on GitHub
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆932Dec 30, 2024Updated last year
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆41Apr 25, 2024Updated 2 years ago
howardlau1999 / hcache-uring
View on GitHub
2022 ECS CloudBuild Distributed Cache Contest - Final Round https://tianchi.aliyun.com/competition/entrance/531982/introduction
☆17Dec 8, 2022Updated 3 years ago