NVIDIA/jitify

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/jitify)

NVIDIA / jitify

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).

☆573

Alternatives and similar repositories for jitify

Users that are interested in jitify are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

eyalroz / cuda-api-wrappers
View on GitHub
Thin, unified, C++-flavored wrappers for the CUDA APIs
☆899Updated this week
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆898Updated this week
NVIDIA / cuCollections
View on GitHub
☆654Updated this week
NVIDIA / cub
View on GitHub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,840Oct 9, 2023Updated 2 years ago
NVIDIA / libcudacxx
View on GitHub
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
☆2,304Feb 7, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NVIDIA / nvcomp
View on GitHub
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloade…
☆627Updated this week
moderngpu / moderngpu
View on GitHub
Patterns and behaviors for GPU computing
☆1,782Jan 17, 2026Updated 6 months ago
NVIDIA / MatX
View on GitHub
An efficient C++20 GPU numerical computing library with Python-like syntax
☆1,437Updated this week
rapidsai / rmm
View on GitHub
RAPIDS Memory Manager
☆705Updated this week
NVIDIA / Fuser
View on GitHub
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆396May 31, 2026Updated last month
stotko / stdgpu
View on GitHub
stdgpu: Efficient STL-like Data Structures on the GPU
☆1,265Jul 8, 2026Updated last week
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,398Updated this week
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆544Updated this week
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆908Sep 26, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
harrism / ranger
View on GitHub
Generate simple index ranges in C++ and CUDA C++
☆39Jun 14, 2023Updated 3 years ago
NVIDIA / thrust
View on GitHub
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
☆5,004Feb 8, 2024Updated 2 years ago
gunrock / gunrock
View on GitHub
Programmable CUDA/C++ GPU Graph Analytics
☆1,096Feb 28, 2026Updated 4 months ago
NVIDIA / cccl
View on GitHub
CUDA Core Compute Libraries
☆2,425Updated this week
eyalroz / cuda-kat
View on GitHub
CUDA kernel author's tools
☆116Apr 24, 2022Updated 4 years ago
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,101Updated this week
bryancatanzaro / trove
View on GitHub
Full-speed Array of Structures access
☆177Apr 25, 2023Updated 3 years ago
trxcllnt / rapids-compose
View on GitHub
☆27Dec 20, 2023Updated 2 years ago
ROCm / rocPRIM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆176Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
csarofeen / pytorch
View on GitHub
Tensors and Dynamic neural networks in Python with strong GPU acceleration
☆27Apr 20, 2023Updated 3 years ago
ROCm / hipCUB
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆83Updated this week
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
alpaka-group / alpaka
View on GitHub
Abstraction Library for Parallel Kernel Acceleration
☆419Jun 25, 2026Updated 3 weeks ago
wmmae / wmma_extension
View on GitHub
An extension library of WMMA API (Tensor Core API)
☆115Jul 12, 2024Updated 2 years ago
NVIDIA / cnmem
View on GitHub
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆298Nov 28, 2018Updated 7 years ago
rapidsai / rapids-cmake
View on GitHub
☆46Updated this week
harrism / hemi
View on GitHub
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
☆348Apr 14, 2022Updated 4 years ago
llnl / RAJA
View on GitHub
RAJA Performance Portability Layer (C++)
☆589Updated this week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
llnl / Umpire
View on GitHub
An application-focused API for memory management on NUMA & GPU architectures
☆416Updated this week
libocca / occa
View on GitHub
Portable and vendor neutral framework for parallel programming on heterogeneous platforms.
☆442Nov 7, 2025Updated 8 months ago
llnl / camp
View on GitHub
Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda
☆104Updated this week
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,890Updated this week
NVIDIA / VisRTX
View on GitHub
NVIDIA OptiX based implementation of ANARI
☆279Updated this week
llnl / blt
View on GitHub
A streamlined CMake build system foundation for developing HPC software
☆293Updated this week
xtensor-stack / xsimd
View on GitHub
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE, WebAssembly, VSX, RISC-…
☆2,721Updated this week