canonizer / halloc
A fast and highly scalable GPU dynamic memory allocator
☆103Updated 9 years ago
Related projects: ⓘ
- Full-speed Array of Structures access☆155Updated last year
- ☆74Updated last year
- This repository contains my experiments with compression-related algorithms☆35Updated 8 years ago
- an assembler/compiler for AMD’s GCN (Generation Core Next Architecture) Assembly Language☆39Updated last year
- Giddy - A lightweight GPU decompression library☆42Updated 5 years ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆97Updated last year
- ☆68Updated 4 years ago
- UME::SIMD A library for explicit simd vectorization.☆90Updated 6 years ago
- mallocMC: Memory Allocator for Many Core Architectures☆50Updated 3 weeks ago
- The Berkeley Container Library☆120Updated last year
- A framework that helps implementing swizzle GPU kernels☆38Updated 4 years ago
- GCN ISA assembler tool for my GSoC project at Openwall☆34Updated 8 years ago
- Heterogeneous Active Messages C++ library☆21Updated 4 years ago
- RV: A Unified Region Vectorizer for LLVM☆102Updated 2 months ago
- GPUfs - File system support for NVIDIA GPUs☆87Updated 5 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆44Updated 9 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- Sample programs for the LLVM PTX back-end☆34Updated 9 years ago
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆16Updated 9 years ago
- Pointer-chasing memory benchmark (forked from Doug Pase's code).☆57Updated 10 years ago
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 6 years ago
- Library with JIT (Just-in-time) compilation support to optimize performance of small and medium matrix multiplication☆12Updated 3 years ago
- Implementation of the SYCL specification.☆68Updated 3 months ago
- Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group☆76Updated 3 years ago
- Execution primitives for C++☆154Updated 4 years ago
- Lock-free parallel disjoint set data structure (aka UNION-FIND) with path compression and union by rank☆58Updated 9 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- This is a header only library offering a variety of dynamically growing concurrent hash tables. That all work by dynamically migrating th…☆106Updated 6 months ago
- Launching collective tasks in bulk☆36Updated 4 years ago