NVlabs / parrotLinks
Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without unnecessary intermediate materializations.
☆248Updated 2 weeks ago
Alternatives and similar repositories for parrot
Users that are interested in parrot are comparing it to the libraries listed below
Sorting:
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆57Updated last week
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆116Updated 6 months ago
- The project provides high-performance concurrency, enabling highly parallel computation.☆236Updated last week
- CUDA kernel author's tools☆116Updated 3 years ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆24Updated 2 months ago
- Fast, easy automatic differentiation in C++☆410Updated this week
- LLM training in simple, raw C/CUDA☆112Updated last year
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- High-level C++ for Accelerator Clusters☆154Updated 2 months ago
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆51Updated 4 months ago
- Abstraction Library for Parallel Kernel Acceleration☆404Updated 2 weeks ago
- C++ template metaprogram driven tensor math library☆90Updated 2 weeks ago
- pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.☆80Updated last week
- C++ HPC Tutorial materials☆54Updated 3 months ago
- A Low-Level Abstraction of Memory Access☆93Updated last year
- An implementation of HIP that works on CPUs, across OSes.☆131Updated last year
- FastAD is a C++ implementation of automatic differentiation both forward and reverse mode.☆118Updated 2 years ago
- Exploring using stdpar and Cython☆34Updated 5 years ago
- C++ Library for Portable SIMD Vectorization☆84Updated last year
- Counter-based random number generators for C, C++ and CUDA.☆115Updated last year
- Reference Implementation for stdBLAS☆155Updated 2 weeks ago
- clad -- automatic differentiation for C/C++☆385Updated last week
- Light and self-contained implementation of C++17 parallel algorithms.☆38Updated last year
- A highly optimised C++ library for mathematical applications and neural networks.☆178Updated 5 months ago
- ☆44Updated this week
- improve the usage experience of std::simd (Parallelism TS 2)☆30Updated 5 months ago
- Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…☆366Updated last year
- Reference implementation of the draft C++ GraphBLAS specification.☆32Updated 11 months ago
- Omnitrace: Application Profiling, Tracing, and Analysis☆346Updated 3 weeks ago
- TTG: Template Task Graph C++ API☆26Updated 2 months ago