TTC: A high-performance Compiler for Tensor Transpositions
☆21Oct 19, 2017Updated 8 years ago
Alternatives and similar repositories for TTC
Users that are interested in TTC are comparing it to the libraries listed below
Sorting:
- Tensor Contraction Code Generator☆39Aug 14, 2017Updated 8 years ago
- Automatic High-Order Optimization for Tensors☆22Apr 14, 2023Updated 2 years ago
- High-Performance Tensor Transpose library☆205May 13, 2023Updated 2 years ago
- Sparse matrix-matrix multiplication on CPU+GPU systems.☆13Mar 17, 2014Updated 12 years ago
- ☆12Mar 1, 2024Updated 2 years ago
- A Google App Engine service that creates AWS accounts on demand using the (beta) Identity and Access Management service.☆18Dec 30, 2010Updated 15 years ago
- heterogeneous BLAST (H-BLAST), a fast parallel search tool for a heterogeneous computer that couples CPUs and GPUs, to accelerate BLASTX…☆12Jun 20, 2018Updated 7 years ago
- 2D & 3D Jump Flooding Algorithm and 2D Centroidal Voronoi Tessellation based on taichi☆11Nov 30, 2020Updated 5 years ago
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆23Jan 11, 2024Updated 2 years ago
- A fork for the old hhsuite-2.0.16☆13Nov 27, 2020Updated 5 years ago
- Source code of our implementation of the concurrent RMA☆12May 23, 2019Updated 6 years ago
- Catamount is a compute graph analysis tool to load, construct, and modify deep learning models and to symbolically analyze their compute …☆14May 18, 2021Updated 4 years ago
- A Scala version of my `sbtmkdirs` shell script☆11Feb 27, 2021Updated 5 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆34Jan 31, 2017Updated 9 years ago
- Cute layout visualization☆33Jan 18, 2026Updated 2 months ago
- Continuum Dynamics Evaluation and Test Suite☆15Aug 29, 2017Updated 8 years ago
- Towards Hardware and Software Continuous Integration☆13Jun 8, 2020Updated 5 years ago
- Org export engine for Jekyll on Markdown☆12May 12, 2022Updated 3 years ago
- Oz-style dataflow (single-assignment) variables and streams for Scala☆42Nov 20, 2009Updated 16 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Jul 7, 2017Updated 8 years ago
- SSE intrinsics implementation for ECL & SBCL☆22Mar 2, 2016Updated 10 years ago
- Optimizations on Graph500☆10Jul 15, 2016Updated 9 years ago
- EmerCoin SSH PKI and distributed ACL☆15Mar 4, 2017Updated 9 years ago
- High-Performance Streaming Graph Analytics on GPUs☆35Jan 28, 2019Updated 7 years ago
- BLAS OpenCL implementation.☆16Apr 8, 2015Updated 10 years ago
- A LaTeX package cocktail for grad school level writing/presentation☆13Feb 11, 2021Updated 5 years ago
- Linux kernel source tree with fast swap patches.☆20Nov 19, 2013Updated 12 years ago
- Parallel implementation of k-means clustering using MPI4PY and PyCUDA.☆10Mar 11, 2019Updated 7 years ago
- Nervana GPU library☆49May 19, 2015Updated 10 years ago
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Dec 4, 2023Updated 2 years ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆37Mar 5, 2026Updated 2 weeks ago
- Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides (SpTRSM)☆14Feb 14, 2020Updated 6 years ago
- Themis MapReduce and TritonSort☆11Nov 2, 2017Updated 8 years ago
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 6 years ago
- Multi-GPU CUDA based scheduler.☆13Jul 20, 2017Updated 8 years ago
- GitHub version of msysgit/git☆12Mar 20, 2014Updated 12 years ago
- Benchmark for Co-running Single Applications on Integrated Architectures☆12Jul 7, 2016Updated 9 years ago
- Benchmark of different C or C++ loggers☆12Sep 13, 2023Updated 2 years ago
- Distributed Performance-portable Stencil Compuitation☆10Jul 9, 2023Updated 2 years ago