flame / blis
BLAS-like Library Instantiation Software Framework
☆2,316Updated this week
Related projects ⓘ
Alternatives and complementary repositories for blis
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆850Updated this week
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,256Updated 7 months ago
- Intel® Implicit SPMD Program Compiler☆2,520Updated this week
- Tuned OpenCL BLAS☆1,063Updated last week
- OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.☆6,401Updated this week
- Vector class library, latest version☆1,308Updated 9 months ago
- High-performance automatic differentiation of LLVM and MLIR.☆1,287Updated this week
- [ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl☆2,294Updated 9 months ago
- automatic differentiation made easier for C++☆1,658Updated last week
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,220Updated this week
- A lightweight high performance tensor algebra framework for modern C++☆751Updated 7 months ago
- ArrayFire: a general purpose GPU library.☆4,567Updated 2 weeks ago
- SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT☆667Updated last week
- Implementations of SIMD instruction sets for systems which don't natively support them.☆2,406Updated this week
- CUDA Core Compute Libraries☆1,278Updated this week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,684Updated last year
- C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))☆2,211Updated last week
- Patterns and behaviors for GPU computing☆1,667Updated 2 years ago
- Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C+…☆1,390Updated this week
- oneAPI Threading Building Blocks (oneTBB)☆5,732Updated this week
- oneAPI Math Kernel Library (oneMKL) Interfaces☆622Updated this week
- Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction☆2,005Updated this week
- Assembler for NVIDIA Maxwell architecture☆950Updated last year
- ☆1,760Updated last year
- a software library containing BLAS functions written in OpenCL☆844Updated 3 months ago
- BLISlab: A Sandbox for Optimizing GEMM☆483Updated 3 years ago
- C++ tensors with broadcasting and lazy computing☆3,363Updated 3 months ago
- Performance-portable, length-agnostic SIMD with runtime dispatch☆4,219Updated this week
- oneAPI Deep Neural Network Library (oneDNN)☆3,635Updated this week
- Compressed numerical arrays that support high-speed random access☆772Updated this week