Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆73Nov 4, 2015Updated 10 years ago
Alternatives and similar repositories for int_fastdiv
Users that are interested in int_fastdiv are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Generate simple index ranges in C++ and CUDA C++☆39Jun 14, 2023Updated 2 years ago
- A simple but efficient C++ thread/worker pool library for asynchronous task management.☆10Jul 11, 2023Updated 2 years ago
- ☆16Jul 28, 2021Updated 4 years ago
- ☆13Aug 28, 2025Updated 6 months ago
- My very own vxsort re-implemented with "modern" C++ by a complete idiot (in C++)☆31Mar 15, 2026Updated last week
- OpenMP offload playground☆10Nov 16, 2024Updated last year
- Refinements of the WFA alignment algorithm with better complexity☆26Mar 31, 2022Updated 3 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- A library to benchmark CUDA code, similar to google benchmark.☆31Apr 18, 2021Updated 4 years ago
- Gray-Scott reaction-diffusion system in 3D using CUDA☆12Jun 8, 2019Updated 6 years ago
- Artifact for 'Register Optimizations for Stencils on GPUs'☆10Sep 18, 2018Updated 7 years ago
- ☆10May 20, 2022Updated 3 years ago
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆569Sep 15, 2025Updated 6 months ago
- Materials for workshop on GPU computation for statistics, data science, machine learning applications.☆14Sep 8, 2016Updated 9 years ago
- ☆44Updated this week
- Finite State Coder☆15Apr 17, 2015Updated 10 years ago
- OpenMP front-end based on LLVM for CGRAs☆10Oct 2, 2022Updated 3 years ago
- LOGAN: High-Performance Multi-GPU X-Drop Long-Read Alignment.☆30Sep 23, 2022Updated 3 years ago
- nVidia's CUDA accelerated Spin Transformations of Discrete Surfaces, based on the original code and paper by Keenan Crane, Ulrich Pinkall…☆17Mar 14, 2018Updated 8 years ago
- CUDA accelerated(X) Multi-Precision library☆95Sep 9, 2016Updated 9 years ago
- A unified framework across multiple programming platforms☆43Feb 10, 2026Updated last month
- C implementation of the Landau-Vishkin algorithm☆35Apr 8, 2022Updated 3 years ago
- Chapel-based Optimization☆14Oct 26, 2025Updated 4 months ago
- ☆16Dec 24, 2024Updated last year
- ☆15Nov 30, 2023Updated 2 years ago
- Sweep and Tiniest Queue & Tight-Inclusion GPU CCD☆20Oct 3, 2025Updated 5 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆86Feb 21, 2024Updated 2 years ago
- CUDA Kernel Benchmarking Library☆831Mar 17, 2026Updated last week
- Network based loader and flasher for Pano G2 devices☆15Jul 8, 2023Updated 2 years ago
- ☆21Aug 21, 2023Updated 2 years ago
- RISC-V System on Chip Builder☆12Sep 27, 2020Updated 5 years ago
- Scalable Integer Sort application for co-design in the exascale era☆19Apr 12, 2021Updated 4 years ago
- This is an enhanced GPU MPM framework with explicit solver☆19Oct 20, 2025Updated 5 months ago
- A simple example showing how to implement a DDA based screen-space ray marcher in Unity☆13Apr 18, 2017Updated 8 years ago
- ☆31Mar 26, 2021Updated 4 years ago
- The source code of the paper An Eigenanalysis of Angle-Based Deformation Energies☆18Sep 12, 2023Updated 2 years ago
- Taichi Implementation of "The Power Particle-in-Cell Method"☆21Aug 21, 2022Updated 3 years ago
- The course notes and sample code for the physically based simulation course given in GAMES Xi'an 05/14/2021☆27Sep 28, 2021Updated 4 years ago
- A CUDA accelerated utility for using HyperLogLog's for cardinality estimation☆19Dec 26, 2012Updated 13 years ago