lin-toto / recoilLinks
Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability
☆15Updated last year
Alternatives and similar repositories for recoil
Users that are interested in recoil are comparing it to the libraries listed below
Sorting:
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- SYCL Reference Manual☆28Updated last year
- ☆10Updated 4 months ago
- InstLatX64_Demo☆43Updated last week
- A High-Throughput Parallel Lossless Compressor for Scientific Data☆65Updated 2 years ago
- A fast implementation of log() and exp()☆53Updated 2 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆39Updated 3 years ago
- ☆57Updated this week
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆14Updated last month
- Massively Parallel ANS Decoding on GPUs☆28Updated 5 years ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆75Updated last week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 7 months ago
- GPU-Accelerated Lossless Data Compressors Survey☆115Updated 4 years ago
- GPU B-Tree with support for versioning (snapshots).☆47Updated 7 months ago
- C++ "borrowing" smart pointer.☆11Updated 3 years ago
- ☆29Updated 2 years ago
- ☆15Updated 2 years ago
- ☆70Updated 4 years ago
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆56Updated 2 years ago
- Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)☆12Updated last month
- ☆51Updated 5 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆55Updated 2 years ago
- Fast C header-only library for popcnt, pospopcnt, and set algebraic operations☆45Updated 5 years ago
- ☆27Updated last year
- A enumerator for MLIR, relying on the information given by IRDL.☆19Updated this week
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆98Updated this week
- ☆29Updated last month
- CLI utilty to work out proper constants for vpternlogic instruction☆13Updated 2 years ago
- Code and results for our paper "Analyzing Vectorized Hash Tables Across CPU Architectures" @ VLDB '23.☆25Updated last year