andravin / spioLinks
Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.
☆43Updated this week
Alternatives and similar repositories for spio
Users that are interested in spio are comparing it to the libraries listed below
Sorting:
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆123Updated last year
- Supercharge Your PyTorch Image Models: Bag of Tricks to 8x Faster Inference with ONNX Runtime & Optimizations☆23Updated last year
- ☆75Updated 3 years ago
- Timm model explorer☆42Updated last year
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆159Updated last year
- ☆134Updated 2 years ago
- ☆59Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆134Updated last month
- Easily run PyTorch on multiple GPUs & machines☆52Updated this week
- Utilities for PyTorch distributed☆25Updated 8 months ago
- ☆51Updated last year
- Mobile Viewer for W&B, built on top of Flutter.☆38Updated last year
- ☆29Updated 4 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- FID computation in Jax/Flax.☆29Updated last year
- Fast, Modern, and Low Precision PyTorch Optimizers☆116Updated 2 months ago
- Little article showing how to load pytorch's models with linear memory consumption☆34Updated 3 years ago
- VIT inference in triton because, why not?☆32Updated last year
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated 2 years ago
- A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…☆244Updated last month
- ☆91Updated last year
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"☆123Updated 7 months ago
- supporting pytorch FSDP for optimizers☆84Updated 11 months ago
- Presents comprehensive benchmarks of XLA-compatible pre-trained models in Keras.☆37Updated 2 years ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆324Updated 4 months ago
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆82Updated 2 years ago
- [ICCV25] Official Implementation of LeGrad☆82Updated last year
- Cyclemoid implementation for PyTorch☆90Updated 3 years ago
- ☆39Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆216Updated 2 years ago