Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆73Nov 4, 2015Updated 10 years ago
Alternatives and similar repositories for int_fastdiv
Users that are interested in int_fastdiv are comparing it to the libraries listed below
Sorting:
- Generate simple index ranges in C++ and CUDA C++☆39Jun 14, 2023Updated 2 years ago
- A simple but efficient C++ thread/worker pool library for asynchronous task management.☆10Jul 11, 2023Updated 2 years ago
- Repository with examples for the C++20 Coroutines video and article.☆29Jul 10, 2024Updated last year
- My very own vxsort re-implemented with "modern" C++ by a complete idiot (in C++)☆31Feb 25, 2026Updated last week
- Common code library☆14Feb 3, 2018Updated 8 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- ☆16Dec 24, 2024Updated last year
- ☆16Jul 28, 2021Updated 4 years ago
- C++11 Header-only continuous-storage Double ended vector implementation similar to STL's std::vector for efficient insertions/removals at…☆16Dec 29, 2022Updated 3 years ago
- A CUDA accelerated utility for using HyperLogLog's for cardinality estimation☆19Dec 26, 2012Updated 13 years ago
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆27Apr 17, 2025Updated 10 months ago
- Pseudo-LRU implementation using 1-bit per entry and achieving Full-LRU performance.☆22Dec 17, 2022Updated 3 years ago
- Fundamental Sources for Water Wave Animation☆20Dec 8, 2022Updated 3 years ago
- Variadic recursive expression templates with lazy evaluation which look like ordinary (possibly nested) containers.☆17Feb 5, 2023Updated 3 years ago
- "Guided Visibility Sampling++", an aggressive from-region visibility algorithm. The implementation is based on the Vulkan graphics API.☆22Apr 18, 2021Updated 4 years ago
- NUMA bindings for Go, requires libnuma.☆26Nov 18, 2019Updated 6 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Jun 8, 2018Updated 7 years ago
- The course notes and sample code for the physically based simulation course given in GAMES Xi'an 05/14/2021☆27Sep 28, 2021Updated 4 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆57Feb 8, 2022Updated 4 years ago
- Compute one-side Hausdorff distance between triangle meshes with error bound.☆29Jul 27, 2022Updated 3 years ago
- ☆27Oct 25, 2021Updated 4 years ago
- BGHT: High-performance static GPU hash tables.☆71Jul 2, 2025Updated 8 months ago
- A library to benchmark CUDA code, similar to google benchmark.☆31Apr 18, 2021Updated 4 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆86Feb 21, 2024Updated 2 years ago
- Massively Parallel ANS Decoding on GPUs☆30Jul 26, 2019Updated 6 years ago
- Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"☆25Feb 24, 2023Updated 3 years ago
- Refinements of the WFA alignment algorithm with better complexity☆26Mar 31, 2022Updated 3 years ago
- Code which write during learning Taichi and GAMES201☆30Nov 18, 2020Updated 5 years ago
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆120Dec 22, 2025Updated 2 months ago
- A Hello World example using CMake and MSVC☆27Oct 13, 2020Updated 5 years ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communication☆29Jul 23, 2023Updated 2 years ago
- FPGA CryptoNight V7 Minner☆31Aug 26, 2019Updated 6 years ago
- ☆33Jun 7, 2024Updated last year
- LOGAN: High-Performance Multi-GPU X-Drop Long-Read Alignment.☆30Sep 23, 2022Updated 3 years ago
- Declarative MLIR compilers in Python!☆36Oct 9, 2020Updated 5 years ago
- ☆32Jul 26, 2022Updated 3 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆127Jan 17, 2023Updated 3 years ago
- SPMD in C++☆68Apr 29, 2020Updated 5 years ago
- ☆87Updated this week