OpenGVLab/all-seeing

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/all-seeing)

OpenGVLab / all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

☆507

Alternatives and similar repositories for all-seeing

Users that are interested in all-seeing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
shikras / d-cube
View on GitHub
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆138Mar 20, 2024Updated 2 years ago
SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆268Jun 12, 2024Updated 2 years ago
bytedance / fc-clip
View on GitHub
[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convoluti…
☆345Feb 5, 2024Updated 2 years ago
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
bytedance / OmniScient-Model
View on GitHub
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
☆102Jul 15, 2024Updated 2 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
V3Det / V3Det
View on GitHub
☆121Jun 11, 2024Updated 2 years ago
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,684Aug 1, 2024Updated last year
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,662Feb 16, 2025Updated last year
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
UX-Decoder / DINOv
View on GitHub
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
☆542Apr 8, 2024Updated 2 years ago
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
amazon-science / prompt-pretraining
View on GitHub
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆259May 3, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
facebookresearch / VLPart
View on GitHub
[ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation
☆395Sep 19, 2023Updated 2 years ago
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated 11 months ago
icoz69 / StableLLAVA
View on GitHub
Official repo for StableLLAVA
☆94Dec 22, 2023Updated 2 years ago
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,052Nov 18, 2024Updated last year
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
baaivision / tokenize-anything
View on GitHub
[ECCV 2024] Tokenize Anything via Prompting
☆601Dec 11, 2024Updated last year
aim-uofa / SegPrompt
View on GitHub
Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning
☆112May 28, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
wusize / CLIPSelf
View on GitHub
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
☆207Feb 5, 2024Updated 2 years ago
FoundationVision / Groma
View on GitHub
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆585Jun 7, 2024Updated 2 years ago
IDEA-Research / OpenSeeD
View on GitHub
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆762Jan 22, 2024Updated 2 years ago
UX-Decoder / Semantic-SAM
View on GitHub
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
☆2,853Jul 10, 2025Updated last year
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆329Jan 20, 2025Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year