CUDA and OpenMP implementations of C2R/R2C inplace transposition
☆48Feb 10, 2015Updated 11 years ago
Alternatives and similar repositories for inplace
Users that are interested in inplace are comparing it to the libraries listed below
Sorting:
- Full-speed Array of Structures access☆176Apr 25, 2023Updated 2 years ago
- Oh My Fast Postgres!☆11Feb 4, 2023Updated 3 years ago
- Fast interpolative decompositions in Python☆10Jan 4, 2021Updated 5 years ago
- An experimental method JIT for CPython 3☆29May 18, 2016Updated 9 years ago
- Fast multidimensional algorithms☆18Feb 8, 2020Updated 6 years ago
- ☆11Dec 5, 2018Updated 7 years ago
- ☆11Aug 8, 2021Updated 4 years ago
- Distributed machine learning platform☆13Aug 20, 2015Updated 10 years ago
- Tensor Contraction Code Generator☆39Aug 14, 2017Updated 8 years ago
- A collection of bit manipulation routines for C++☆21Jul 24, 2013Updated 12 years ago
- The SparseX sparse kernel optimization library☆43Jan 16, 2019Updated 7 years ago
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 5 months ago
- Graph Intermediate Representation (GIR) library for ML☆23Mar 18, 2017Updated 8 years ago
- OSDI 2023 Welder, deeplearning compiler☆32Nov 24, 2023Updated 2 years ago
- Data Parallel Python☆209May 10, 2013Updated 12 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Nov 27, 2016Updated 9 years ago
- MPI accelerator-integrated communication extensions☆39Apr 4, 2023Updated 2 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆30Jun 25, 2017Updated 8 years ago
- Probabilistic multi-sensor geophysical inversions on clusters☆10Dec 18, 2017Updated 8 years ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 11 months ago
- ☆38Oct 3, 2023Updated 2 years ago
- Open-source stochastic GW software☆13Apr 28, 2025Updated 10 months ago
- High Performance Computing for Weather and Climate☆42Feb 3, 2026Updated last month
- a light weight, high performance coroutine implementation☆39Oct 16, 2012Updated 13 years ago
- Build and run container environment for LFRic☆10Jan 8, 2024Updated 2 years ago
- ☆11Aug 7, 2024Updated last year
- Vikunja is a performance portable algorithm library that defines functions operating on ranges of elements for a variety of purposes . It…☆16Oct 10, 2023Updated 2 years ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Feb 20, 2026Updated 2 weeks ago
- SR-VAE☆10Jul 26, 2021Updated 4 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆206Sep 14, 2013Updated 12 years ago
- This project provides a series of modules which enable functions of the Vascular Modeling Toolkit (http://www.vmtk.org) in 3D Slicer (htt…☆16Mar 22, 2013Updated 12 years ago
- mallocMC: Memory Allocator for Many Core Architectures☆58Feb 2, 2026Updated last month
- Use CUDA intrinsics with user-defined types☆48Aug 14, 2014Updated 11 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆185Dec 12, 2022Updated 3 years ago
- Serialization component for the Asphalt framework☆11Mar 2, 2026Updated last week
- Code for blog post on r-squared☆13Jul 25, 2016Updated 9 years ago
- Hyperoctree construction and manipulation☆11Jan 4, 2021Updated 5 years ago
- Building a toy OS in Rust.☆13Sep 14, 2016Updated 9 years ago
- 🔧 SQL for csv file in UNIX command line with awk.☆16Aug 6, 2022Updated 3 years ago