google / memcpy-gemm
☆16Updated 2 years ago
Alternatives and similar repositories for memcpy-gemm:
Users that are interested in memcpy-gemm are comparing it to the libraries listed below
- npcomp - An aspirational MLIR based numpy compiler☆51Updated 4 years ago
- ☆56Updated this week
- Portable 128-bit SIMD intrinsics☆58Updated last year
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆50Updated 11 months ago
- Tests and benchmarks for cudnn (and in the future, other nvidia libraries)☆53Updated 4 years ago
- DLA-Future☆70Updated this week
- Information about AVX-512 support on recent Intel processors☆44Updated 2 years ago
- ☆94Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- SYCL-ML is a C++ library, implementing classical machine learning algorithms using SYCL.☆66Updated 5 years ago
- Pybind11 bindings for the Abseil C++ Common Libraries☆28Updated 3 weeks ago
- SYCL Reference Manual☆27Updated 10 months ago
- Simple C++ code to benchmark fast division algorithms☆47Updated 3 years ago
- Mirror kept for legacy. Moved to https://github.com/llvm/llvm-project☆34Updated 5 years ago
- C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.☆55Updated 6 years ago
- Mirror kept for legacy. Moved to https://github.com/llvm/llvm-project☆25Updated 5 years ago
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆21Updated 3 months ago
- portDNN is a library implementing neural network algorithms written using SYCL☆111Updated 10 months ago
- A Low-Level Abstraction of Memory Access☆85Updated last year
- Automatically exported from code.google.com/p/freeocl☆30Updated 7 years ago
- A copy of the Intel Cilk Plus runtime system with modifications to work with OpenCilk and its associated tools.☆12Updated 4 years ago
- Tools and extensions for CUDA profiling☆65Updated 5 years ago
- Concurrent CPU-GPU Programming using Task Models☆101Updated 5 years ago
- Boost.org tokenizer module☆24Updated last week
- totally unofficial git repo containing sources for the CppMem tool available at http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/help.html and…☆25Updated 12 years ago
- ☆18Updated 8 years ago
- A C/C++ task-based programming model for shared memory and distributed parallel computing.☆71Updated 4 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆49Updated last week
- A collection of formatting benchmarks☆47Updated last month