ap-hynninen / cuttView external linksLinks
CUDA Tensor Transpose (cuTT) library
☆53Aug 10, 2017Updated 8 years ago
Alternatives and similar repositories for cutt
Users that are interested in cutt are comparing it to the libraries listed below
Sorting:
- High-Performance Tensor Transpose library☆205May 13, 2023Updated 2 years ago
- Tensor Contraction Code Generator☆39Aug 14, 2017Updated 8 years ago
- Tensor Algebra Library Routines for Shared Memory Systems☆39Nov 30, 2023Updated 2 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- An Open Source Kepler GPU Assembler☆21Jan 23, 2017Updated 9 years ago
- A Winograd Minimal Filter Implementation in CUDA☆28Aug 25, 2021Updated 4 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆85Oct 8, 2019Updated 6 years ago
- Tensor Contraction C++ Library☆55Aug 22, 2019Updated 6 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆239Jan 13, 2022Updated 4 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 6 years ago
- Strassen's Algorithm for Tensor Contraction☆14Jul 7, 2017Updated 8 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 6 years ago
- ☆19Aug 21, 2023Updated 2 years ago
- benchmarking miopen☆17Jan 14, 2019Updated 7 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆567Apr 20, 2023Updated 2 years ago
- An HPL-AI implementation for Fugaku☆23Jun 29, 2021Updated 4 years ago
- CL Offline Compiler : Compile OpenCL kernels to HSAIL☆50May 5, 2017Updated 8 years ago
- This repo contains an implementation of the Simple-Update Tensor Network algorithm as described in the paper - A universal tensor network…☆27May 3, 2025Updated 9 months ago
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆25Feb 9, 2021Updated 5 years ago
- Distributed-parallel C/C++ Tensor Library☆20Sep 18, 2025Updated 4 months ago
- SeQuant: Symbolic Algebra of Tensors over Operators and Scalars☆26Updated this week
- A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves (SpTRSV)☆22Feb 14, 2020Updated 6 years ago
- Repository for SysML19 Artifacts Evaluation☆53Feb 28, 2019Updated 6 years ago
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆59Jan 7, 2023Updated 3 years ago
- Cross interpolation of high-dimensional arrays in tensor train format☆26Feb 10, 2024Updated 2 years ago
- Hybrid methods for Parallel Betweenness Centrality on the GPU☆24Dec 20, 2018Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Aug 12, 2017Updated 8 years ago
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Oct 3, 2023Updated 2 years ago
- Comb is a communication performance benchmarking tool.☆26Feb 27, 2023Updated 2 years ago
- examples for tvm schedule API☆101Jun 12, 2023Updated 2 years ago
- ☆24Jun 10, 2019Updated 6 years ago
- ☆27Feb 8, 2026Updated last week
- TensorOperations.jl compatible fast contractor for Julia, based on TBLIS, with generic strides and automatic differentiation support, wit…☆28Dec 20, 2022Updated 3 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆76Mar 27, 2023Updated 2 years ago
- Hierarchical Tensor Networks at Exascale☆67Jul 24, 2023Updated 2 years ago
- MIT iQuHACK 2022 x Microsoft x IonQ Challenge☆10Jan 30, 2022Updated 4 years ago
- Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.☆23Aug 21, 2020Updated 5 years ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆32Jun 25, 2025Updated 7 months ago