dingwentao / GPU-lossless-compression
GPU-Accelerated Lossless Data Compressors Survey
☆110Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for GPU-lossless-compression
- Massively Parallel Huffman Decoding on GPUs☆44Updated 5 years ago
- Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloade…☆560Updated 2 months ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆65Updated this week
- TurboRC - Fastest Range Coder + Arithmetic Coding / Fastest Asymmetric Numeral Systems☆71Updated last year
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆114Updated 10 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 8 years ago
- Massively Parallel ANS Decoding on GPUs☆28Updated 5 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆107Updated last year
- Error-bounded Lossy Data Compressor (for floating-point/integer datasets)☆155Updated 7 months ago
- A Library for fast Hash Tables on GPUs☆109Updated 2 years ago
- ☆146Updated this week
- TLB Benchmarks☆32Updated 7 years ago
- rocWMMA☆91Updated this week
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆78Updated 5 years ago
- Lossless compressor of multidimensional floating-point arrays☆106Updated 4 years ago
- LLVM AMDGPU Assembler Helper Tools☆111Updated 7 years ago
- amdgpu example code in hip/asm☆21Updated 2 weeks ago
- CUPTI GPU Profiler☆37Updated 5 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- A fast and highly scalable GPU dynamic memory allocator☆103Updated 9 years ago
- oneAPI Collective Communications Library (oneCCL)☆206Updated this week
- A High-Throughput Parallel Lossless Compressor for Scientific Data☆61Updated last year
- AVX512F and AVX2 versions of quick sort☆105Updated 6 years ago
- Flexible GPGPU instrumentation☆86Updated 5 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆18Updated 8 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- ☆50Updated 4 years ago
- ☆67Updated 2 years ago