lin-toto / recoilLinks

Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability

☆16

Alternatives and similar repositories for recoil

Users that are interested in recoil are comparing it to the libraries listed below

Sorting:

owensgroup / MVGpuBTree
GPU B-Tree with support for versioning (snapshots).
☆49Updated 8 months ago
InstLatx64 / InstLatX64_Demo
InstLatX64_Demo
☆43Updated last month
AMDResearch / DAGEE
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…
☆46Updated 3 years ago
intel / vc-intrinsics
☆58Updated last month
KhronosGroup / SYCL_Reference
SYCL Reference Manual
☆28Updated last year
WojciechMula / simd-heap
☆10Updated 5 months ago
libxsmm / libxsmm-dnn
Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)
☆12Updated 3 months ago
ProjectPhysX / PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆55Updated 3 months ago
ashvardanian / ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆99Updated last month
oneapi-src / unified-runtime
☆48Updated this week
intel / ittapi
Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs
☆117Updated 2 weeks ago
corsix / fast-crc32
Fast CRC32 implementations
☆80Updated 3 weeks ago
scalable-analyses / sme
☆26Updated 3 months ago
szcompressor / cuSZ
A GPU accelerated error-bounded lossy compression for scientific data.
☆84Updated last month
ROCm / TransferBench
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
☆42Updated this week
mklarqvist / libalgebra
Fast C header-only library for popcnt, pospopcnt, and set algebraic operations
☆45Updated 5 years ago
amd / aocl-compression
A software library of lossless data compression methods tuned and optimized for AMD “Zen”-based CPUs
☆29Updated last week
owensgroup / GpuBTree
Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019
☆57Updated 3 years ago
nadavrot / fast_log
A fast implementation of log() and exp()
☆53Updated 2 years ago
ROCm / rocm_bandwidth_test
Bandwidth test for ROCm
☆59Updated 2 weeks ago
intel / DTO
A user level library for applications to transparently use Intel DSA.
☆38Updated 2 weeks ago
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆68Updated last week
celerity / ndzip
A High-Throughput Parallel Lossless Compressor for Scientific Data
☆70Updated 2 years ago
kunpengcompute / AvxToNeon
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆123Updated last year
intel / iaa-plugin-rocksdb
☆16Updated 3 months ago
cwida / ALP
ALP: Adaptive Lossless Floating-Point Compression
☆107Updated 2 months ago
oneapi-src / unified-memory-framework
A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management.…
☆66Updated this week
oneapi-src / distributed-ranges
Distributed ranges is a generalization of C++ ranges for distributed data structures.
☆51Updated last week
intel / uintr-ipc-bench
☆37Updated last year
microsoft / BLAS-on-flash
Linear algebra subroutines for large SSD-resident dense and sparse matrices
☆27Updated 4 years ago