ggjy / DeLVMView external linksLinks
☆120Jun 6, 2024Updated last year
Alternatives and similar repositories for DeLVM
Users that are interested in DeLVM are comparing it to the libraries listed below
Sorting:
- ☆38Feb 8, 2024Updated 2 years ago
- ☆1,841Jun 28, 2024Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆30May 20, 2024Updated last year
- ☆15May 25, 2024Updated last year
- ☆14Jul 15, 2024Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆390Jul 9, 2024Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆186Jul 5, 2024Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Sep 26, 2024Updated last year
- ☆141Jun 28, 2024Updated last year
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆111May 28, 2025Updated 8 months ago
- A huge dataset for Document Visual Question Answering☆20Jul 29, 2024Updated last year
- Emu Series: Generative Multimodal Models from BAAI☆1,765Jan 12, 2026Updated last month
- ☆19Jul 25, 2024Updated last year
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Sep 12, 2024Updated last year
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆23Mar 18, 2025Updated 10 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆39Mar 11, 2025Updated 11 months ago
- Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation☆1,928Aug 15, 2024Updated last year
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆89Oct 12, 2024Updated last year
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆27Oct 13, 2024Updated last year
- [AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding☆57Nov 28, 2022Updated 3 years ago
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated last year
- ☆15Nov 11, 2024Updated last year
- ☆10Apr 7, 2025Updated 10 months ago
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆144Feb 11, 2025Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆129Aug 21, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Jul 17, 2024Updated last year
- Simple Implementation of Pix2seqV2(multi-task)☆26Dec 16, 2024Updated last year
- [NeurIPS 2024] VastTrack: Vast Category Visual Object Tracking☆73Sep 30, 2025Updated 4 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- ☆10Dec 3, 2023Updated 2 years ago
- ☆25Dec 23, 2024Updated last year
- [NeurIPS 2024] Repository for the paper "OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking".☆27Nov 9, 2024Updated last year
- When do we not need larger vision models?☆412Feb 8, 2025Updated last year
- ☆28Aug 21, 2023Updated 2 years ago
- ☆43May 6, 2024Updated last year
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆607May 8, 2024Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- Source code of the paper: Overlapped Trajectory-Enhanced Visual Tracking☆11Sep 3, 2024Updated last year