dorpxam / einops-cpp
C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation
☆10Updated last year
Alternatives and similar repositories for einops-cpp:
Users that are interested in einops-cpp are comparing it to the libraries listed below
- extensible collectives library in triton☆86Updated last month
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆61Updated last month
- ☆16Updated 7 months ago
- Sparsity support for PyTorch☆34Updated last month
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆102Updated this week
- FlexAttention w/ FlashAttention3 Support☆26Updated 7 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ☆26Updated last year
- The simplest but fast implementation of matrix multiplication in CUDA.☆34Updated 9 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated last week
- Automatic differentiation for Triton Kernels☆11Updated last month
- Accelerated First Order Parallel Associative Scan☆182Updated 8 months ago
- ☆10Updated 2 years ago
- ☆36Updated 4 months ago
- Attention in SRAM on Tenstorrent Grayskull☆35Updated 9 months ago
- Material for the SC22 Deep Learning at Scale Tutorial☆41Updated last year
- ☆18Updated 5 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆122Updated this week
- Einsum optimization using opt_einsum and PyTorch FX graph rewriting☆21Updated 3 years ago
- Reference Kernels for the Leaderboard☆42Updated last week
- A library for unit scaling in PyTorch☆125Updated 5 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆70Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 9 months ago
- JAX bindings for Flash Attention v2☆89Updated 9 months ago
- ☆202Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆261Updated this week
- ☆21Updated 2 months ago
- ☆51Updated 8 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆115Updated 5 months ago