weissenberger / gpuhd
Massively Parallel Huffman Decoding on GPUs
☆47Updated 5 years ago
Alternatives and similar repositories for gpuhd:
Users that are interested in gpuhd are comparing it to the libraries listed below
- Massively Parallel ANS Decoding on GPUs☆28Updated 5 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆113Updated 4 years ago
- Giddy - A lightweight GPU decompression library☆42Updated 5 years ago
- AVX512F and AVX2 versions of quick sort☆105Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆109Updated 2 years ago
- Portable 128-bit SIMD intrinsics☆57Updated last year
- TLB Benchmarks☆33Updated 7 years ago
- GPUDirect Async support for IB Verbs☆95Updated 2 years ago
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆57Updated 2 years ago
- ☆31Updated 3 years ago
- ROCm - AMDGPU Compute Application Binary Interface☆41Updated 2 years ago
- LLVM AMDGPU Assembler Helper Tools☆111Updated 7 years ago
- ☆67Updated 2 years ago
- ☆15Updated 7 years ago
- Next generation FFT implementation for ROCm☆185Updated this week
- A High-Throughput Parallel Lossless Compressor for Scientific Data☆62Updated 2 years ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆116Updated last year
- A GPU accelerated error-bounded lossy compression for scientific data.☆69Updated this week
- IMPORTANT NOTICE: This implementation is long outdated. The new libwfv will be released soon. Whole-Function Vectorization is an algorith…☆22Updated 12 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆82Updated 9 months ago
- A fast and highly scalable GPU dynamic memory allocator☆103Updated 9 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆35Updated 9 years ago
- UME::SIMD A library for explicit simd vectorization.☆91Updated 7 years ago
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆14Updated 10 months ago
- portDNN is a library implementing neural network algorithms written using SYCL☆109Updated 8 months ago
- Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provi…☆66Updated 11 months ago
- ☆56Updated 3 weeks ago
- Full-speed Array of Structures access☆164Updated last year