NVlabs / parrotLinks
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without unnecessary intermediate materializations.
☆247Updated last month
Alternatives and similar repositories for parrot
Users that are interested in parrot are comparing it to the libraries listed below
Sorting:
- Fast, easy automatic differentiation in C++☆409Updated 2 weeks ago
- The project provides high-performance concurrency, enabling highly parallel computation.☆234Updated 3 weeks ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆56Updated this week
- CUDA kernel author's tools☆115Updated 3 years ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆116Updated 6 months ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- A Low-Level Abstraction of Memory Access☆93Updated last year
- ☆150Updated last year
- LLM training in simple, raw C/CUDA☆112Updated last year
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆24Updated 2 months ago
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆51Updated 4 months ago
- pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.☆79Updated 3 weeks ago
- Counter-based random number generators for C, C++ and CUDA.☆112Updated last year
- A highly optimised C++ library for mathematical applications and neural networks.☆177Updated 5 months ago
- C++23 Tensor, neural networks and mathematical library☆42Updated this week
- FastAD is a C++ implementation of automatic differentiation both forward and reverse mode.☆118Updated 2 years ago
- Reference Implementation for stdBLAS☆154Updated this week
- A fast implementation of log() and exp()☆56Updated 3 years ago
- C++ HPC Tutorial materials☆54Updated 3 months ago
- Agenium Scale vectorization library for CPUs and GPUs☆337Updated 4 years ago
- Exploring using stdpar and Cython☆34Updated 5 years ago
- An implementation of HIP that works on CPUs, across OSes.☆131Updated last year
- Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…☆366Updated last year
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆61Updated last year
- A Clang-based C++ Interoperability Library☆87Updated last week
- C++ Default Guidelines☆147Updated 2 weeks ago
- High-level C++ for Accelerator Clusters☆154Updated 2 months ago
- DLA-Future☆82Updated 2 months ago
- A massively-parallel, block-sparse tensor framework written in C++☆313Updated this week
- Reference implementation of the draft C++ GraphBLAS specification.☆32Updated 11 months ago