kyegomez / MegaVIT
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
☆25Updated last week
Related projects: ⓘ
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆28Updated 2 months ago
- ☆24Updated this week
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆27Updated 5 months ago
- A simple reproducible template to implement AI research papers☆21Updated last week
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆30Updated 6 months ago
- ☆32Updated 8 months ago
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆29Updated 2 years ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆92Updated last week
- ☆31Updated 4 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated 11 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆36Updated 5 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 5 months ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated last month
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Updated last week
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆46Updated 3 weeks ago
- JAX implementation ViT-VQGAN☆77Updated last year
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆22Updated 7 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- Implementation of the text to video model LUMIERE from the paper: "A Space-Time Diffusion Model for Video Generation" by Google Research☆50Updated last week
- ☆18Updated 3 weeks ago
- ☆65Updated this week
- Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".☆90Updated 3 months ago
- A curated list of papers and resources for text-to-image evaluation.☆26Updated last year
- Efficient Multi-modal Models via Stage-wise Visual Context Compression☆34Updated last month
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated 11 months ago
- ☆24Updated last year
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆159Updated 3 months ago