abhijangda / fastkronLinks
☆19Updated 6 months ago
Alternatives and similar repositories for fastkron
Users that are interested in fastkron are comparing it to the libraries listed below
Sorting:
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆62Updated this week
- Sparsity support for PyTorch☆36Updated 5 months ago
- Personal solutions to the Triton Puzzles☆20Updated last year
- Experiment of using Tangent to autodiff triton☆80Updated last year
- High-Performance SGEMM on CUDA devices☆99Updated 7 months ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated last year
- A parallel framework for training deep neural networks☆63Updated 5 months ago
- Collection of kernels written in Triton language☆152Updated 4 months ago
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆25Updated last year
- ☆81Updated last year
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆140Updated 4 months ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- ☆32Updated 10 months ago
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆328Updated 8 months ago
- Physics-inspired transformer modules based on mean-field dynamics of vector-spin models in JAX☆41Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆51Updated this week
- extensible collectives library in triton☆88Updated 5 months ago
- ☆57Updated 10 months ago
- LLM training in simple, raw C/CUDA☆104Updated last year
- A library for unit scaling in PyTorch☆130Updated last month
- ☆15Updated 11 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆96Updated 2 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆94Updated 8 months ago
- train with kittens!☆62Updated 10 months ago
- This is a port of Mistral-7B model in JAX☆32Updated last year
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆23Updated 3 weeks ago
- Custom kernels in Triton language for accelerating LLMs☆25Updated last year
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆32Updated 4 months ago