abhijangda / fastkronLinks
☆22Updated 9 months ago
Alternatives and similar repositories for fastkron
Users that are interested in fastkron are comparing it to the libraries listed below
Sorting:
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆66Updated 2 months ago
- Sparsity support for PyTorch☆37Updated 8 months ago
- C++ and Python libraries for neural networks.☆18Updated 3 weeks ago
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆25Updated last year
- ☆83Updated 2 years ago
- Physics-inspired transformer modules based on mean-field dynamics of vector-spin models in JAX☆45Updated 2 years ago
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- JAX implementation of the Mistral 7b v0.2 model☆35Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆64Updated 2 weeks ago
- Personal solutions to the Triton Puzzles☆20Updated last year
- Implementation of Forward Laplacian algorithm in JAX☆92Updated 3 weeks ago
- ☆33Updated last year
- ☆62Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆171Updated last month
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated last year
- Automatic differentiation for Triton Kernels☆29Updated 4 months ago
- Collection of kernels written in Triton language☆173Updated 8 months ago
- Einsum-like high-level array sharding API for JAX☆34Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆70Updated last month
- train with kittens!☆63Updated last year
- Parallelizing non-linear sequential models over the sequence length☆56Updated 5 months ago
- Multiple dispatch over abstract array types in JAX.☆136Updated 3 weeks ago
- Fast and memory-efficient exact attention☆74Updated 9 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated 11 months ago
- This is a repository with examples to run inference endpoints on various ALCF clusters☆26Updated last month
- Compressing Large Language Models using Low Precision and Low Rank Decomposition☆106Updated 3 weeks ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆27Updated last month
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆319Updated last week