abhijangda / fastkron
☆16Updated last month
Alternatives and similar repositories for fastkron:
Users that are interested in fastkron are comparing it to the libraries listed below
- Sparsity support for PyTorch☆33Updated last week
- EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…☆53Updated last week
- Personal solutions to the Triton Puzzles☆18Updated 6 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago
- Explore training for quantized models☆13Updated 3 weeks ago
- FlexAttention w/ FlashAttention3 Support☆27Updated 3 months ago
- SGEMM that beats cuBLAS☆68Updated last week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- ☆12Updated 3 years ago
- ☆24Updated 2 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆135Updated 8 months ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated 7 months ago
- High dimensional black-box optimizer using Latent Action Monte Carlo Tree Search algorithm☆26Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆74Updated last year
- A parallel framework for training deep neural networks☆50Updated last week
- ☆64Updated 2 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 8 months ago
- Implementation of Spectral State Space Models☆16Updated 11 months ago
- extensible collectives library in triton☆77Updated 4 months ago
- ☆58Updated 8 months ago
- ☆21Updated 3 months ago
- Collection of kernels written in Triton language☆91Updated 3 months ago
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- LLM training in simple, raw C/CUDA☆91Updated 8 months ago
- Fast and memory-efficient exact attention☆57Updated last month
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆31Updated 4 months ago
- ☆15Updated 4 months ago
- Official Implementation of "CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks"☆17Updated last month
- This is a port of Mistral-7B model in JAX☆30Updated 6 months ago
- ☆97Updated 5 months ago