CharithYMendis / Helium
Helium: Lifting High-Performance Stencil Kernels from Stripped x86 Binaries to Halide DSL Code
☆45Updated 8 years ago
Alternatives and similar repositories for Helium:
Users that are interested in Helium are comparing it to the libraries listed below
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- Intel Heterogeneous Research Compiler (iHRC)☆25Updated 2 years ago
- Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group☆76Updated 4 years ago
- ☆75Updated last year
- Scientific library for high-precision computations and research☆50Updated 7 years ago
- GCN ISA assembler tool for my GSoC project at Openwall☆35Updated 9 years ago
- A fast and highly scalable GPU dynamic memory allocator☆104Updated 9 years ago
- an assembler/compiler for AMD’s GCN (Generation Core Next Architecture) Assembly Language☆39Updated 2 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- Benchmarking matrix multiplication implementations☆98Updated 8 years ago
- A C++ expression -> x86 JIT☆18Updated 7 years ago
- CL Offline Compiler : Compile OpenCL kernels to HSAIL☆50Updated 7 years ago
- A domain-specific language and compiler for image processing☆76Updated 3 years ago
- ☆88Updated 5 years ago
- Tools for parsing, assembling, and disassembling HSAIL.☆71Updated 4 years ago
- Python bindings for libNVVM☆37Updated 10 years ago
- Checks to verify the usage of the MPI API in C and C++ code, based on Clang’s Static Analyzer and Clang-Tidy.☆38Updated 5 months ago
- Easy to run kernels using OpenCL☆183Updated 7 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆46Updated 10 years ago
- Enable Polyhedral JIT compilation☆9Updated 6 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆109Updated 2 years ago
- [deprecated] Reference Implementation of OpenSHMEM on GASNet (specification <= 1.3)☆43Updated 7 years ago
- Full-speed Array of Structures access☆164Updated last year
- A compiler intermediate representation for image recognition and heterogeneous computing.☆78Updated 8 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago
- BSP implementation for the Parallella; the world's smallest supercomputer☆27Updated 7 years ago
- The cilkplus/llvm repo implements the Intel Cilk Plus language extensions to C and C++ in LLVM.☆68Updated 9 years ago
- Intel(R) Concurrent Collections for C++☆115Updated 2 years ago
- GPU Automatically Tuned Linear Algebra Software☆28Updated 9 years ago