abhijangda / fastkronLinks
☆20Updated 8 months ago
Alternatives and similar repositories for fastkron
Users that are interested in fastkron are comparing it to the libraries listed below
Sorting:
- Sparsity support for PyTorch☆37Updated 7 months ago
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆65Updated 3 weeks ago
- Parallel framework for training and fine-tuning deep neural networks☆65Updated 2 weeks ago
- A Data-Centric Compiler for Machine Learning☆85Updated last year
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- ☆60Updated last year
- ☆83Updated last year
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆35Updated last week
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆95Updated 11 months ago
- Personal solutions to the Triton Puzzles☆20Updated last year
- ☆33Updated last year
- extensible collectives library in triton☆90Updated 7 months ago
- Collection of kernels written in Triton language☆161Updated 7 months ago
- This is a repository with examples to run inference endpoints on various ALCF clusters☆26Updated last week
- ☆20Updated 6 years ago
- ☆28Updated 9 months ago
- train with kittens!☆63Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- Experiment of using Tangent to autodiff triton☆79Updated last year
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆330Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 11 months ago
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆25Updated last year
- ☆53Updated last year
- High dimensional black-box optimizer using Latent Action Monte Carlo Tree Search algorithm☆29Updated 3 years ago
- Automatic differentiation for Triton Kernels☆28Updated 2 months ago
- Train across all your devices, ezpz 🍋☆24Updated 2 weeks ago
- Accelerated First Order Parallel Associative Scan☆189Updated last year
- ☆112Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆110Updated last year
- Parallelizing non-linear sequential models over the sequence length☆54Updated 4 months ago