cmeraki / vit.triton
VIT inference in triton because, why not?
☆16Updated 3 months ago
Related projects: ⓘ
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆233Updated 4 months ago
- ☆164Updated 8 months ago
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Updated 2 years ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆73Updated 3 months ago
- ☆48Updated 11 months ago
- Object Recognition as Next Token Prediction (CVPR 2024)☆153Updated 2 months ago
- Implementation of Infini-Transformer in Pytorch☆100Updated last month
- A simple minimal implementation of Reversible Vision Transformers☆114Updated 6 months ago
- [ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"☆157Updated last month
- Megatron's multi-modal data loader☆42Updated this week
- A library for unit scaling in PyTorch☆94Updated 2 weeks ago
- Official implementation of the Law of Vision Representation in MLLMs☆93Updated last week
- ☆64Updated 11 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆46Updated 3 weeks ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆115Updated last month
- Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496☆75Updated last month
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆66Updated last year
- 94% on CIFAR-10 in 3.09 seconds 💨 96% in 27 seconds☆127Updated last month
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated last year
- A compilation of network architectures for vision and others without usage of self-attention mechanism☆77Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆33Updated 3 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆104Updated 6 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆56Updated this week
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated last year
- Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”☆243Updated 11 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆101Updated last year
- Timm model explorer☆36Updated 5 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆77Updated last week
- Explorations into the recently proposed Taylor Series Linear Attention☆85Updated last month
- ☆21Updated 3 months ago