dorpxam / einops-cppLinks
C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation
☆11Updated 2 years ago
Alternatives and similar repositories for einops-cpp
Users that are interested in einops-cpp are comparing it to the libraries listed below
Sorting:
- ☆16Updated last year
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆67Updated last month
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Updated 7 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Updated 5 months ago
- ☆14Updated 10 months ago
- A library of GPU kernels for sparse matrix operations.☆283Updated 5 years ago
- A Data-Centric Compiler for Machine Learning☆85Updated last month
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- A Library for fast Hash Tables on GPUs☆132Updated 3 months ago
- ☆53Updated 9 months ago
- Fast and Furious AMD Kernels☆348Updated 2 weeks ago
- ☆49Updated last year
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆93Updated 2 years ago
- A warp-oriented dynamic hash table for GPUs☆76Updated 2 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆125Updated last year
- ☆53Updated last week
- ☆17Updated 3 years ago
- Attention in SRAM on Tenstorrent Grayskull☆40Updated last year
- A novell, highly-optimized CUDA implementation of k-means algorithm.☆41Updated 3 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆111Updated 2 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆96Updated 3 weeks ago
- cuASR: CUDA Algebra for Semirings☆44Updated 3 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆137Updated 3 years ago
- MLIR-based partitioning system☆164Updated last week
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆50Updated 7 years ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆739Updated this week
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆192Updated last year
- Scalable radix top-k selection on GPUs.☆21Updated last year
- PyTorch interface for the IPU☆181Updated 2 years ago
- A GPU algorithm for sparse matrix-matrix multiplication☆74Updated 5 years ago