lin-toto / recoil
Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability
☆15Updated last year
Alternatives and similar repositories for recoil:
Users that are interested in recoil are comparing it to the libraries listed below
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- InstLatX64_Demo☆43Updated 3 weeks ago
- The goal of the library is to help with research in the area of data compression. This is not meant to be fast or efficient implementatio…☆87Updated 3 months ago
- A fast implementation of log() and exp()☆53Updated 2 years ago
- GPU B-Tree with support for versioning (snapshots).☆47Updated 6 months ago
- SYCL Reference Manual☆27Updated last year
- TurboRC - Fastest Range Coder + Arithmetic Coding / Fastest Asymmetric Numeral Systems☆80Updated last year
- GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs☆14Updated 3 weeks ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 6 months ago
- rans optimized tricks☆24Updated 7 years ago
- ☆26Updated last year
- Massively Parallel ANS Decoding on GPUs☆28Updated 5 years ago
- Triton to TVM transpiler.☆19Updated 6 months ago
- Ocolos is the first online code layout optimization system for unmodified applications written in unmanaged languages.☆52Updated last year
- BGHT: High-performance static GPU hash tables.☆63Updated last month
- A High-Throughput Parallel Lossless Compressor for Scientific Data☆65Updated 2 years ago
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆109Updated last week
- A GPU accelerated error-bounded lossy compression for scientific data.☆75Updated this week
- Massively Parallel Huffman Decoding on GPUs☆48Updated 6 years ago
- Fast CRC32 implementations☆74Updated last year
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- Code and results for our paper "Analyzing Vectorized Hash Tables Across CPU Architectures" @ VLDB '23.☆25Updated last year
- This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Pu…☆29Updated 3 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆55Updated 2 years ago
- ☆15Updated 8 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆38Updated 3 years ago
- A GPU FP32 computation method with Tensor Cores.☆20Updated 2 years ago
- Information about AVX-512 support on recent Intel processors☆45Updated 3 years ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆26Updated 4 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month