bryancatanzaro / inplaceView external linksLinks
CUDA and OpenMP implementations of C2R/R2C inplace transposition
☆48Feb 10, 2015Updated 11 years ago
Alternatives and similar repositories for inplace
Users that are interested in inplace are comparing it to the libraries listed below
Sorting:
- Full-speed Array of Structures access☆176Apr 25, 2023Updated 2 years ago
- Fast interpolative decompositions in Python☆10Jan 4, 2021Updated 5 years ago
- Oh My Fast Postgres!☆11Feb 4, 2023Updated 3 years ago
- Communication Avoiding Numerical Dense Matrix Computations☆11Dec 20, 2020Updated 5 years ago
- Improved performance for TensorFlow on Intel hardware.☆13Jun 25, 2018Updated 7 years ago
- Strassen's Algorithm for Tensor Contraction☆14Jul 7, 2017Updated 8 years ago
- Selected Decomposition Routines☆20Aug 30, 2025Updated 5 months ago
- Tensor Contraction Code Generator☆39Aug 14, 2017Updated 8 years ago
- The SparseX sparse kernel optimization library☆43Jan 16, 2019Updated 7 years ago
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆22Oct 12, 2019Updated 6 years ago
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 5 months ago
- A Chainer extension for K-FAC☆20Jun 16, 2019Updated 6 years ago
- OSDI 2023 Welder, deeplearning compiler☆32Nov 24, 2023Updated 2 years ago
- Data Parallel Python☆209May 10, 2013Updated 12 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Nov 27, 2016Updated 9 years ago
- MPI accelerator-integrated communication extensions☆39Apr 4, 2023Updated 2 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆30Jun 25, 2017Updated 8 years ago
- Quantum Computing for Nuclear Physics☆13Jan 9, 2026Updated last month
- Fast and easy to use, high frequency trading framework for betfair☆10Sep 16, 2021Updated 4 years ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 10 months ago
- a light weight, high performance coroutine implementation☆39Oct 16, 2012Updated 13 years ago
- ☆11Aug 7, 2024Updated last year
- Build and run container environment for LFRic☆10Jan 8, 2024Updated 2 years ago
- SKFAC Preconditioner for MindSpore☆12Jul 2, 2021Updated 4 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆206Sep 14, 2013Updated 12 years ago
- SR-VAE☆10Jul 26, 2021Updated 4 years ago
- Vikunja is a performance portable algorithm library that defines functions operating on ranges of elements for a variety of purposes . It…☆16Oct 10, 2023Updated 2 years ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Feb 4, 2026Updated last week
- Generating music with Machine Learning☆11Feb 8, 2021Updated 5 years ago
- Lupa for Torch☆10Sep 16, 2015Updated 10 years ago
- This project provides a series of modules which enable functions of the Vascular Modeling Toolkit (http://www.vmtk.org) in 3D Slicer (htt…☆16Mar 22, 2013Updated 12 years ago
- mallocMC: Memory Allocator for Many Core Architectures☆58Feb 2, 2026Updated 2 weeks ago
- Use CUDA intrinsics with user-defined types☆48Aug 14, 2014Updated 11 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆185Dec 12, 2022Updated 3 years ago
- A batch (multiple concurrent sequence pairs) implementation of Dynamic Time Warping (DTW) in Theano☆10Sep 13, 2015Updated 10 years ago
- Not Another Range Library☆39Mar 9, 2014Updated 11 years ago
- design, run and test desired situations using human, AI, bot or computer control☆11Oct 9, 2018Updated 7 years ago
- Various C++ utilities we have collected for cross product use☆10Nov 1, 2025Updated 3 months ago
- Wrapper for generating PROV provenance information for commands and python scripts☆15Oct 14, 2014Updated 11 years ago