michael-lehn / ulmBLAS
ulmBLAS
☆104Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for ulmBLAS
- sparse matrix pre-processing library☆81Updated 6 months ago
- High-Performance Tensor Transpose library☆184Updated last year
- Full-speed Array of Structures access☆160Updated last year
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆45Updated 9 years ago
- CUDA Tensor Transpose (cuTT) library☆49Updated 7 years ago
- ☆11Updated 8 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆201Updated 3 months ago
- CUSP : A C++ Templated Sparse Matrix Library☆403Updated this week
- A massively-parallel, block-sparse tensor framework written in C++☆256Updated this week
- ☆90Updated 7 years ago
- Library to plot integer sets and maps☆47Updated 7 years ago
- Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)☆117Updated 2 years ago
- A fast and highly scalable GPU dynamic memory allocator☆103Updated 9 years ago
- Python wrapper for isl, an integer set library☆73Updated this week
- Recursive LAPACK Collection☆42Updated 2 years ago
- Combined array and automatic differentiation library in C++☆165Updated 8 months ago
- Fork of magma to include more BLAS☆28Updated 7 years ago
- a software library containing Sparse functions written in OpenCL☆173Updated 4 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆100Updated last year
- Mirror of the Cephes C source for reference☆86Updated 10 months ago
- Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction☆65Updated 3 weeks ago
- Vector Math Library☆75Updated 7 years ago
- RAJA Performance Suite☆110Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆123Updated last year
- Fast matrix multiplication☆28Updated 3 years ago
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆18Updated 9 years ago
- Loop Kernel Analysis and Performance Modeling Toolkit☆88Updated 2 months ago
- High-performance object-based library for DLA computations☆235Updated 5 months ago
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago