indri-voice / vit.triton

VIT inference in triton because, why not?

☆22

Alternatives and similar repositories for vit.triton:

Users that are interested in vit.triton are comparing it to the libraries listed below

NVlabs / COAT
☆52Updated last week
mit-han-lab / patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
☆81Updated 7 months ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆67Updated 7 months ago
andravin / spio
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
☆38Updated last month
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆107Updated 2 weeks ago
cloneofsimo / min-fsdp
☆75Updated 6 months ago
kyleliang919 / C-Optim
When it comes to optimizers, it's always better to be safe than sorry
☆157Updated this week
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆50Updated 4 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆91Updated 5 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆74Updated 11 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆50Updated 9 months ago
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆69Updated last month
ChenMnZ / PrefixQuant
An algorithm for static activation quantization of LLMs
☆107Updated last week
TomerRonen34 / mixed-resolution-vit
☆49Updated last year
BBuf / flash-rwkv
☆31Updated 7 months ago
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆75Updated this week
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆96Updated 3 months ago
j10labs / wandview
Mobile Viewer for W&B, built on top of Flutter.
☆32Updated 10 months ago
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆35Updated 7 months ago
alenic / timm-models-explorer
Timm model explorer
☆36Updated 9 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆58Updated 8 months ago
gpu-mode / profiling-cuda-in-torch
☆138Updated 11 months ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆79Updated this week
lucasjinreal / ImageTokenizer
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…
☆30Updated 6 months ago
NVlabs / EfficientDL
☆31Updated 7 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆150Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆115Updated 4 months ago
Adamdad / neumeta
NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…
☆38Updated 2 months ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆44Updated last year
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆21Updated 3 weeks ago