lin-toto / recoil
Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability
☆15Updated last year
Alternatives and similar repositories for recoil:
Users that are interested in recoil are comparing it to the libraries listed below
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last week
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆14Updated last year
- Massively Parallel ANS Decoding on GPUs☆28Updated 5 years ago
- Triton to TVM transpiler.☆19Updated 5 months ago
- A High-Throughput Parallel Lossless Compressor for Scientific Data☆64Updated 2 years ago
- GPU B-Tree with support for versioning (snapshots).☆47Updated 5 months ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆24Updated 3 months ago
- GPU-Accelerated Lossless Data Compressors Survey☆114Updated 4 years ago
- A fast implementation of log() and exp()☆53Updated 2 years ago
- ☆10Updated 2 months ago
- ☆56Updated last week
- A GPU FP32 computation method with Tensor Cores.☆20Updated 2 years ago
- SYCL Reference Manual☆27Updated 11 months ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆73Updated 2 weeks ago
- InstLatX64_Demo☆42Updated last month
- Utilities for accessing AMD's Machine-Readable GPU ISA Specifications.☆31Updated 3 weeks ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 10 months ago
- BGHT: High-performance static GPU hash tables.☆62Updated 6 months ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆37Updated 3 years ago
- The goal of the library is to help with research in the area of data compression. This is not meant to be fast or efficient implementatio…☆85Updated last month
- ☆24Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- PTX-EMU is a simple emulator for CUDA program.☆30Updated last year
- ☆13Updated last year
- TurboRC - Fastest Range Coder + Arithmetic Coding / Fastest Asymmetric Numeral Systems☆80Updated last year
- Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)☆12Updated last month
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last week
- A enumerator for MLIR, relying on the information given by IRDL.☆19Updated 2 weeks ago
- End to End steps for adding custom ops in PyTorch.☆21Updated 4 years ago