☆120Jun 6, 2024Updated last year
Alternatives and similar repositories for DeLVM
Users that are interested in DeLVM are comparing it to the libraries listed below
Sorting:
- ☆38Feb 8, 2024Updated 2 years ago
- ☆1,842Jun 28, 2024Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated last year
- ☆15May 25, 2024Updated last year
- ☆14Jul 15, 2024Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆391Jul 9, 2024Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆186Jul 5, 2024Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Sep 26, 2024Updated last year
- ☆80Jul 3, 2024Updated last year
- JAX implementation ViT-VQGAN☆82Sep 21, 2022Updated 3 years ago
- ☆141Jun 28, 2024Updated last year
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆111May 28, 2025Updated 9 months ago
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- Emu Series: Generative Multimodal Models from BAAI☆1,768Jan 12, 2026Updated last month
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Sep 12, 2024Updated last year
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆23Mar 18, 2025Updated 11 months ago
- ☆19Jul 25, 2024Updated last year
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆39Mar 11, 2025Updated 11 months ago
- FastMIM, official pytorch implementation of our paper "FastMIM: Expediting Masked Image Modeling Pre-training for Vision"(https://arxiv.o…☆39Dec 29, 2022Updated 3 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated 11 months ago
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,937Aug 15, 2024Updated last year
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆27Oct 13, 2024Updated last year
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆90Oct 12, 2024Updated last year
- [AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding☆57Nov 28, 2022Updated 3 years ago
- ☆10Apr 7, 2025Updated 11 months ago
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated 2 years ago
- ☆15Nov 11, 2024Updated last year
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆145Feb 11, 2025Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Jul 17, 2024Updated last year
- A repository for DenseSSMs☆89Apr 11, 2024Updated last year
- Simple Implementation of Pix2seqV2(multi-task)☆27Dec 16, 2024Updated last year
- [NeurIPS 2024] VastTrack: Vast Category Visual Object Tracking☆73Sep 30, 2025Updated 5 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- ☆13Aug 7, 2025Updated 6 months ago
- ☆10Dec 3, 2023Updated 2 years ago
- ☆25Dec 23, 2024Updated last year
- [NeurIPS 2024] Repository for the paper "OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking".☆27Nov 9, 2024Updated last year
- When do we not need larger vision models?☆413Feb 8, 2025Updated last year