dorpxam / einops-cppLinks
C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation
☆11Updated 2 years ago
Alternatives and similar repositories for einops-cpp
Users that are interested in einops-cpp are comparing it to the libraries listed below
Sorting:
- ☆16Updated last year
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆67Updated last month
- Attention in SRAM on Tenstorrent Grayskull☆40Updated last year
- ☆49Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Updated 7 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Updated 5 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆96Updated 3 weeks ago
- A novell, highly-optimized CUDA implementation of k-means algorithm.☆41Updated 3 years ago
- A library of GPU kernels for sparse matrix operations.☆283Updated 5 years ago
- Automatic differentiation for Triton Kernels☆29Updated 5 months ago
- A Library for fast Hash Tables on GPUs☆132Updated 3 months ago
- ☆34Updated 4 years ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆64Updated 2 weeks ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Updated 7 years ago
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- ☆14Updated 10 months ago
- A warp-oriented dynamic hash table for GPUs☆76Updated 2 years ago
- A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.☆27Updated last week
- A Data-Centric Compiler for Machine Learning☆85Updated last month
- ☆104Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆739Updated this week
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆86Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆380Updated this week
- Fast and Furious AMD Kernels☆348Updated 2 weeks ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆200Updated last week
- Step by step implementation of a fast softmax kernel in CUDA☆60Updated last year
- ☆53Updated 9 months ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆142Updated 2 years ago
- MLIR-based partitioning system☆164Updated last week
- Some CUDA design patterns and a bit of template magic for CUDA☆158Updated 2 years ago