indri-voice / vit.triton
VIT inference in triton because, why not?
☆22Updated 7 months ago
Alternatives and similar repositories for vit.triton:
Users that are interested in vit.triton are comparing it to the libraries listed below
- ☆52Updated last week
- Patch convolution to avoid large GPU memory usage of Conv2D☆81Updated 7 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆67Updated 7 months ago
- Efficient CUDA kernels for training convolutional neural networks with PyTorch.☆38Updated last month
- Implementation of Infini-Transformer in Pytorch☆107Updated 2 weeks ago
- ☆75Updated 6 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆157Updated this week
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆50Updated 4 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆91Updated 5 months ago
- Experiment of using Tangent to autodiff triton☆74Updated 11 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆69Updated last month
- An algorithm for static activation quantization of LLMs☆107Updated last week
- ☆49Updated last year
- ☆31Updated 7 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆75Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 3 months ago
- Mobile Viewer for W&B, built on top of Flutter.☆32Updated 10 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated 7 months ago
- Timm model explorer☆36Updated 9 months ago
- Here we will test various linear attention designs.☆58Updated 8 months ago
- ☆138Updated 11 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆79Updated this week
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 6 months ago
- ☆31Updated 7 months ago
- PB-LLM: Partially Binarized Large Language Models☆150Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆115Updated 4 months ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆38Updated 2 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆21Updated 3 weeks ago