VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Sep 12, 2024Updated last year
Alternatives and similar repositories for VL-GPT
Users that are interested in VL-GPT are comparing it to the libraries listed below
Sorting:
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆27Oct 13, 2024Updated last year
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAI☆1,765Jan 12, 2026Updated last month
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- [ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"☆360Jan 14, 2025Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Jul 17, 2024Updated last year
- ☆72Mar 10, 2025Updated 11 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆147Nov 14, 2024Updated last year
- ☆19Apr 28, 2023Updated 2 years ago
- Physics-based Zero-Shot Video Generation☆31Oct 4, 2024Updated last year
- Recent LLM-based CV and related works. Welcome to comment/contribute!☆874Mar 8, 2025Updated 11 months ago
- ☆17Nov 17, 2023Updated 2 years ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆249Apr 3, 2024Updated last year
- [ECCV 2024] Tokenize Anything via Prompting☆603Dec 11, 2024Updated last year
- Understanding Self-Supervised Learning in a non-IID Setting☆21Oct 21, 2022Updated 3 years ago
- MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]☆23Dec 10, 2025Updated 2 months ago
- Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models☆313Dec 28, 2023Updated 2 years ago
- ☆285Aug 14, 2025Updated 6 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆322Jan 20, 2025Updated last year
- A paper list of self-supervised pretrain method☆22Aug 15, 2025Updated 6 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆391Jul 9, 2024Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆459Dec 2, 2024Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- ☆156Oct 31, 2024Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).☆640Sep 21, 2024Updated last year
- ☆21Apr 17, 2025Updated 10 months ago
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆85Nov 2, 2022Updated 3 years ago
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Jan 8, 2025Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- A curated list of papers and resources for text-to-image evaluation.☆30Sep 6, 2023Updated 2 years ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆42Dec 16, 2025Updated 2 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- [NeurIPS 2024] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation☆70Oct 27, 2024Updated last year
- ICCV'23 | Adverse Weather Removal with Codebook Priors☆10Aug 28, 2023Updated 2 years ago
- Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis (ACCV 2022)☆10Jul 22, 2024Updated last year
- ☆10May 12, 2018Updated 7 years ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆106Mar 24, 2025Updated 11 months ago
- ☆401Dec 12, 2024Updated last year
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆183Mar 4, 2024Updated last year