Reverse engineering NVIDIA SASS instruction dictionary, kernel audits and pattern recognition across GPU architectures.
☆306May 18, 2026Updated last month
Alternatives and similar repositories for sass-king
Users that are interested in sass-king are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.☆188May 15, 2026Updated last month
- Website for CSE 234, Winter 2025☆16Mar 24, 2025Updated last year
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆137Sep 24, 2025Updated 9 months ago
- Source code 💻 minimap 🗺️ extension for GitHub 🙈☆11Sep 17, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …☆11May 4, 2022Updated 4 years ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆99Jan 16, 2026Updated 5 months ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023…☆13Dec 5, 2023Updated 2 years ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- ☆40Dec 14, 2025Updated 6 months ago
- Contest Problems☆10Jun 15, 2018Updated 8 years ago
- Open Source SSD Controller. NVMe and Lightstor variants☆17May 21, 2014Updated 12 years ago
- A benchmark of real-world DL kernel problems☆238May 28, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Automatic differentiation for Triton Kernels☆29Aug 12, 2025Updated 10 months ago
- ☆42Mar 28, 2024Updated 2 years ago
- ☆26Feb 17, 2025Updated last year
- ☆17Aug 2, 2023Updated 2 years ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆31Dec 12, 2020Updated 5 years ago
- Parsers for CUDA binary files☆25Dec 29, 2023Updated 2 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆71Apr 14, 2025Updated last year
- ☆19Jun 22, 2026Updated last week
- Heterogeneous Accelerated Computed Cluster (HACC) Resources Page☆22Apr 27, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A complete CUDA tutorial ranging from first GPU programs to advanced asynchronous methods☆30Jan 22, 2026Updated 5 months ago
- Tools to deploy GPU clusters in the Cloud☆34Apr 4, 2023Updated 3 years ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆16Mar 19, 2023Updated 3 years ago
- Rust-native GPU kernel authoring framework: write GPU compute kernels in Rust, compile to PTX. The Triton equivalent for the Rust ecosyst…☆35Jun 12, 2026Updated 2 weeks ago
- corundum work on vu13p☆23Nov 10, 2023Updated 2 years ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆72Updated this week
- GVProf: A Value Profiler for GPU-based Clusters☆54Mar 24, 2024Updated 2 years ago
- ☆44Apr 27, 2026Updated 2 months ago
- ☆13Apr 10, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Enlightener, the cutting-edge Retrieval-Augmented Generation (RAG) system that revolutionizes query responses. By combining the power of …☆13Jul 28, 2025Updated 11 months ago
- ☆26Feb 20, 2024Updated 2 years ago
- ☆19Updated this week
- CUDA PTX-ISA Document 中文翻译版☆55Sep 29, 2025Updated 9 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆33Apr 2, 2025Updated last year
- https://xuruowei.com 是她的家人朋友们和她的爱人高策为纪念她留下的。徐若薇于 2026 年 2 月 28 日离世。我们希望通过这个时间线纪念她的一生——照片、故事、文字、音乐与她钟爱的一切。沿着她生命的轨迹漫步,重新触摸那些有温度的瞬间。☆28Apr 1, 2026Updated 3 months ago
- zMonkey is an open-source 200G network impairment emulator tool☆26Mar 8, 2022Updated 4 years ago