dorpxam / einops-cpp

C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation

☆10

Alternatives and similar repositories for einops-cpp:

Users that are interested in einops-cpp are comparing it to the libraries listed below

cchan / tccl
extensible collectives library in triton
☆86Updated last month
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆90Updated 3 months ago
IntelLabs / EquiTriton
EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…
☆61Updated last month
manishucsd / py-codegen
☆16Updated 7 months ago
spcl / sten
Sparsity support for PyTorch
☆34Updated last month
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆102Updated this week
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 7 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆78Updated last year
riverstone496 / awesome-second-order-optimization
☆26Updated last year
andylolu2 / simpleGEMM
The simplest but fast implementation of matrix multiplication in CUDA.
☆34Updated 9 months ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆84Updated last week
daniel-geon-park / triton_bwd
Automatic differentiation for Triton Kernels
☆11Updated last month
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆182Updated 8 months ago
nullplay / Unified-Convolution-Framework
☆10Updated 2 years ago
topal-team / rockmate
☆36Updated 4 months ago
moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆35Updated 9 months ago
NERSC / sc22-dl-tutorial
Material for the SC22 Deep Learning at Scale Tutorial
☆41Updated last year
olcf / NVIDIA-tensor-core-examples
☆18Updated 5 years ago
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆122Updated this week
Linux-cpp-lisp / opt_einsum_fx
Einsum optimization using opt_einsum and PyTorch FX graph rewriting
☆21Updated 3 years ago
gpu-mode / reference-kernels
Reference Kernels for the Leaderboard
☆42Updated last week
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆125Updated 5 months ago
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆70Updated this week
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated 9 months ago
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆89Updated 9 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆202Updated last week
jax-ml / ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
☆261Updated this week
lianakoleva / no-libtorch-compile
☆21Updated 2 months ago
iree-org / iree-jax
☆51Updated 8 months ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆115Updated 5 months ago