flame / blisLinks
BLAS-like Library Instantiation Software Framework
☆2,582Updated last month
Alternatives and similar repositories for blis
Users that are interested in blis are comparing it to the libraries listed below
Sorting:
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,341Updated 8 months ago
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆929Updated this week
- Tuned OpenCL BLAS☆1,163Updated last month
- ArrayFire: a general purpose GPU library.☆4,842Updated 3 months ago
- ☆1,971Updated 2 years ago
- LAPACK development repository☆1,775Updated last week
- Patterns and behaviors for GPU computing☆1,756Updated 3 years ago
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,811Updated 2 years ago
- An efficient C++20 GPU numerical computing library with Python-like syntax☆1,384Updated last week
- OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.☆7,189Updated last week
- oneAPI Math Library (oneMath)☆736Updated 3 weeks ago
- High-performance automatic differentiation of LLVM and MLIR.☆1,520Updated this week
- SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT☆795Updated last week
- a software library containing BLAS functions written in OpenCL☆862Updated last year
- The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.☆1,418Updated 2 weeks ago
- Numerical linear algebra software package☆546Updated last week
- trying to collect all useful tutorials for famous C math and linear algebra libraries such as CBLAS, CLAPACK, GSL...☆438Updated 4 years ago
- Assembler for NVIDIA Maxwell architecture☆1,058Updated 2 years ago
- CUDA Core Compute Libraries☆2,096Updated this week
- Programmable CUDA/C++ GPU Graph Analytics☆1,050Updated last year
- Intel® Implicit SPMD Program Compiler☆2,810Updated last week
- pocl - Portable Computing Language☆1,044Updated this week
- A lightweight high performance tensor algebra framework for modern C++☆826Updated 5 months ago
- C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE, WebAssembly, VSX, RISC-…☆2,572Updated last week
- Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction☆2,411Updated this week
- Thin, unified, C++-flavored wrappers for the CUDA APIs☆866Updated last month
- Source code examples from the Parallel Forall Blog☆1,314Updated 3 months ago
- automatic differentiation made easier for C++☆1,893Updated 11 months ago
- Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, …☆1,764Updated last week
- C++ tensors with broadcasting and lazy computing☆3,681Updated last month