NVlabs / xmp
CUDA accelerated(X) Multi-Precision library
☆87Updated 8 years ago
Alternatives and similar repositories for xmp:
Users that are interested in xmp are comparing it to the libraries listed below
- The CUDA Multiple Precision Arithmetic Library☆44Updated 12 years ago
- CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups☆206Updated 3 months ago
- A 128 bit unsigned integer class for CUDA☆43Updated 2 weeks ago
- Extended-precision modular arithmetic library that targets CUDA.☆41Updated 4 years ago
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆18Updated 9 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆109Updated 2 years ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆108Updated last year
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆104Updated last year
- A framework that helps implementing swizzle GPU kernels☆41Updated 4 years ago
- Flexible GPGPU instrumentation☆86Updated 5 years ago
- Decuda and cudasm, the CUDA binary utilities package. Low-level tools for NVidia G80 GPUs.☆97Updated 14 years ago
- gpuprec: Extended-Precision Libraries on GPUs☆35Updated 9 years ago
- Extended-precision modular arithmetic library that targets CUDA.☆34Updated last year
- CUDA kernel author's tools☆110Updated 2 years ago
- A Library for fast Hash Tables on GPUs☆113Updated 2 years ago
- Chai☆42Updated last year
- Tapir extension to LLVM for optimizing Parallel Programs☆133Updated 4 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆71Updated 9 years ago
- A GPU cache model for research purposes☆26Updated 11 years ago
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago
- Kernel Tuning Toolkit☆55Updated 2 months ago
- GPUOCelot: A dynamic compilation framework for PTX☆281Updated last year
- SYCL Open Source Specification☆122Updated this week
- A task benchmark☆40Updated 5 months ago
- Loop Kernel Analysis and Performance Modeling Toolkit☆91Updated 4 months ago
- A GPU accelerated implementation of the sieve of Eratosthenes☆62Updated 2 years ago
- ☆16Updated 3 years ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆69Updated last month
- portDNN is a library implementing neural network algorithms written using SYCL☆109Updated 7 months ago