eyalroz / libgiddy
Giddy - A lightweight GPU decompression library
☆42Updated 5 years ago
Alternatives and similar repositories for libgiddy:
Users that are interested in libgiddy are comparing it to the libraries listed below
- mallocMC: Memory Allocator for Many Core Architectures☆55Updated 2 weeks ago
- ☆75Updated last year
- This repository contains my experiments with compression-related algorithms☆35Updated 8 years ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 7 years ago
- A fast and highly scalable GPU dynamic memory allocator☆104Updated 10 years ago
- Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provi…☆66Updated last year
- Full-speed Array of Structures access☆169Updated last year
- A High-Throughput Parallel Lossless Compressor for Scientific Data☆64Updated 2 years ago
- Generic SIMD intrinsic to allow for portable SIMD intrinsic programming☆42Updated 11 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- OpenCL/SPIR-V implementation of HIP☆104Updated 2 years ago
- Manual for the C++ vector class library☆30Updated last year
- RTX compute samples☆70Updated last year
- ☆31Updated 3 years ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!☆95Updated last month
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆14Updated last year
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group☆76Updated 4 years ago
- C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.☆55Updated 6 years ago
- data-parallel out-of-core library☆50Updated last week
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- AVX512F and AVX2 versions of quick sort☆105Updated 7 years ago
- Fast C header-only library for popcnt, pospopcnt, and set algebraic operations☆45Updated 5 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆114Updated 4 years ago
- Portable 128-bit SIMD intrinsics☆58Updated last year
- Range-based for loops to iterate over a range of numbers or values☆35Updated 8 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆55Updated 2 years ago
- ☆68Updated 2 years ago
- Experimental ranges for CUDA☆24Updated 6 years ago