NVlabs / parrotLinks
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without unnecessary intermediate materializations.
☆243Updated 3 weeks ago
Alternatives and similar repositories for parrot
Users that are interested in parrot are comparing it to the libraries listed below
Sorting:
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆55Updated last week
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆115Updated 5 months ago
- The project provides high-performance concurrency, enabling highly parallel computation.☆230Updated this week
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- CUDA kernel author's tools☆115Updated 3 years ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆24Updated last month
- A Low-Level Abstraction of Memory Access☆93Updated last year
- LLM training in simple, raw C/CUDA☆109Updated last year
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆52Updated 3 months ago
- Fast, easy automatic differentiation in C++☆404Updated this week
- FastAD is a C++ implementation of automatic differentiation both forward and reverse mode.☆118Updated 2 years ago
- ☆150Updated last year
- An implementation of HIP that works on CPUs, across OSes.☆131Updated last year
- ☆70Updated last week
- Reference implementation of the draft C++ GraphBLAS specification.☆32Updated 10 months ago
- pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.☆79Updated last week
- C++ template library for probabilistic programming☆51Updated 5 years ago
- ☆44Updated this week
- A graph library using modern C++ features (e.g., C++20 ranges) to be as efficient and user-friendly as possible.☆53Updated this week
- Counter-based random number generators for C, C++ and CUDA.☆113Updated last year
- A Clang-based C++ Interoperability Library☆86Updated last week
- A fast implementation of log() and exp()☆56Updated 3 years ago
- C++ Library for Portable SIMD Vectorization☆84Updated last year
- Omnitrace: Application Profiling, Tracing, and Analysis☆340Updated this week
- Abstraction Library for Parallel Kernel Acceleration☆400Updated 3 weeks ago
- High-level C++ for Accelerator Clusters☆155Updated last month
- Agenium Scale vectorization library for CPUs and GPUs☆337Updated 4 years ago
- a CUDA implementation of a priority queue☆84Updated 5 years ago
- Struct-of-Arrays generator for C++ projects.☆60Updated last year
- C++ HPC Tutorial materials☆54Updated 2 months ago