AILab-CVC/VL-GPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AILab-CVC/VL-GPT)

AILab-CVC / VL-GPT

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

☆86

Alternatives and similar repositories for VL-GPT

Users that are interested in VL-GPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MengLcool / SEGIC
View on GitHub
[ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".
☆27Oct 13, 2024Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
tomchen-ctj / CVPR23-LOVEU-AQTC
View on GitHub
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
☆15Jul 18, 2023Updated 3 years ago
DefengXie / Edit_Everything
View on GitHub
☆19Apr 28, 2023Updated 3 years ago
mezzelfo / MotionCraft
View on GitHub
Physics-based Zero-Shot Video Generation
☆31Oct 4, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Yangyi-Chen / SOLO
View on GitHub
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆150Nov 14, 2024Updated last year
OpenGVLab / MM-Interleaved
View on GitHub
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆255Apr 3, 2024Updated 2 years ago
Haiyang-W / GiT
View on GitHub
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
☆364Jan 14, 2025Updated last year
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆330Jan 20, 2025Updated last year
hustvl / OpenInst
View on GitHub
☆17Nov 17, 2023Updated 2 years ago
OpenGVLab / STM-Evaluation
View on GitHub
☆70Jun 9, 2026Updated last month
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
baaivision / tokenize-anything
View on GitHub
[ECCV 2024] Tokenize Anything via Prompting
☆601Dec 11, 2024Updated last year
patrick-tssn / VideoHallucer
View on GitHub
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆43Dec 16, 2025Updated 7 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
DirtyHarryLYL / LLM-in-Vision
View on GitHub
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆871Mar 8, 2025Updated last year
WisconsinAIVision / ViP-LLaVA
View on GitHub
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆338Jul 17, 2024Updated 2 years ago
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
AILab-CVC / SEED
View on GitHub
Official implementation of SEED-LLaMA (ICLR 2024).
☆642Sep 21, 2024Updated last year
UX-Decoder / FIND
View on GitHub
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆132Aug 21, 2024Updated last year
fundamentalvision / Uni-Perceiver
View on GitHub
☆291Aug 14, 2025Updated 11 months ago
liruiw / Dec-SSL
View on GitHub
Understanding Self-Supervised Learning in a non-IID Setting
☆21Oct 21, 2022Updated 3 years ago
Zeqiang-Lai / Mini-DALLE3
View on GitHub
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
☆313Dec 28, 2023Updated 2 years ago
lizhaoliu-Lec / CG-VLM
View on GitHub
This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.
☆20Dec 1, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HDETR / H-PETR-Pose
View on GitHub
[CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".
☆14Sep 1, 2022Updated 3 years ago
Meituan-AutoML / VisionLLaMA
View on GitHub
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆392Jul 9, 2024Updated 2 years ago
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
haonan3 / V1
View on GitHub
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆36Apr 14, 2025Updated last year
Owen718 / AWRCP
View on GitHub
ICCV'23 | Adverse Weather Removal with Codebook Priors
☆10Aug 28, 2023Updated 2 years ago
ggjy / DeLVM
View on GitHub
☆120Jun 6, 2024Updated 2 years ago
yhZhai / mcm
View on GitHub
[NeurIPS 2024] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
☆71Oct 27, 2024Updated last year
ChangyaoTian / ADDP
View on GitHub
The official implementation of ADDP (ICLR 2024)
☆12Mar 27, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
sled-group / InfEdit
View on GitHub
[CVPR 2024] Official implementation, Inversion-Free Image Editing with Natural Language"
☆362May 28, 2024Updated 2 years ago
NingWang2049 / STIGPN
View on GitHub
Space-Time Interaction Graph Parsing Networks for Human-Object Interaction Recognition，ACM MM'21
☆14May 12, 2022Updated 4 years ago
zai-org / LVBench
View on GitHub
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆145Jul 9, 2025Updated last year
UMass-Embodied-AGI / FlexAttention
View on GitHub
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆49Jan 8, 2025Updated last year
youngtboy / Awesome-Self-Supervised-Vision-Pretrain
View on GitHub
A paper list of self-supervised pretrain method
☆24Jun 16, 2026Updated last month
zhenyuw16 / CompAgent_code
View on GitHub
Code release for our paper "Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation".
☆18Jan 30, 2024Updated 2 years ago
kohjingyu / gill
View on GitHub
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
☆470Jan 19, 2024Updated 2 years ago