dingwentao / GPU-lossless-compression
GPU-Accelerated Lossless Data Compressors Survey
☆110Updated 4 years ago
Related projects: ⓘ
- Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloade…☆556Updated last week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆113Updated 8 months ago
- Massively Parallel Huffman Decoding on GPUs☆40Updated 5 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆107Updated last year
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 8 years ago
- OpenCL/SPIR-V implementation of HIP☆104Updated last year
- ROCm - AMDGPU Compute Application Binary Interface☆40Updated 2 years ago
- ☆53Updated last week
- A GPU accelerated error-bounded lossy compression for scientific data.☆61Updated last week
- Example code for Intel AVX / AVX2 intrinsics.☆123Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆106Updated 3 months ago
- ☆145Updated 3 weeks ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆81Updated 6 months ago
- assembler for NVIDIA FERMI. Imported from Google Code☆68Updated 9 years ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆353Updated last month
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 8 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆76Updated 4 years ago
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆19Updated 4 years ago
- rocWMMA☆85Updated this week
- ☆34Updated 3 years ago
- MIOpenGEMM is now deprecated☆61Updated last year
- A profiler to disclose and quantify hardware features on GPUs.☆158Updated 2 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆213Updated this week
- ☆39Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆39Updated 8 months ago
- CUPTI GPU Profiler☆36Updated 5 years ago
- ☆82Updated 2 weeks ago
- An extension library of WMMA API (Tensor Core API)☆81Updated 2 months ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆97Updated last year