shenyunhang/APE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shenyunhang/APE)

shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

☆608

Alternatives and similar repositories for APE

Users that are interested in APE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

VITA-MLLM / Woodpecker
View on GitHub
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
☆649Dec 23, 2024Updated last year
FoundationVision / GLEE
View on GitHub
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
☆1,172Oct 21, 2024Updated last year
YifanXu74 / MQ-Det
View on GitHub
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
☆346Feb 23, 2024Updated 2 years ago
UX-Decoder / DINOv
View on GitHub
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
☆542Apr 8, 2024Updated 2 years ago
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆788Dec 8, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
HarborYuan / ovsam
View on GitHub
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
☆1,031Aug 4, 2025Updated 11 months ago
IDEA-Research / OpenSeeD
View on GitHub
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆763Jan 22, 2024Updated 2 years ago
baaivision / tokenize-anything
View on GitHub
[ECCV 2024] Tokenize Anything via Prompting
☆601Dec 11, 2024Updated last year
lxtGH / OMG-Seg
View on GitHub
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
☆1,351Oct 15, 2025Updated 9 months ago
FoundationVision / Groma
View on GitHub
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆585Jun 7, 2024Updated 2 years ago
bytedance / fc-clip
View on GitHub
[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convoluti…
☆345Feb 5, 2024Updated 2 years ago
jianzongwu / Awesome-Open-Vocabulary
View on GitHub
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆998May 12, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
V3Det / V3Det
View on GitHub
☆121Jun 11, 2024Updated 2 years ago
FoundationVision / GenerateU
View on GitHub
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
☆196Mar 29, 2025Updated last year
wusize / CLIPSelf
View on GitHub
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
☆207Feb 5, 2024Updated 2 years ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆508Aug 9, 2024Updated last year
UX-Decoder / Semantic-SAM
View on GitHub
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
☆2,853Jul 10, 2025Updated last year
Surrey-UP-Lab / RegionSpot
View on GitHub
Recognize Any Regions
☆123Dec 18, 2024Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
Charles-Xie / awesome-described-object-detection
View on GitHub
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring E…
☆358Nov 6, 2025Updated 8 months ago
amazon-science / prompt-pretraining
View on GitHub
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆259May 3, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,685Aug 1, 2024Updated last year
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
View on GitHub
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,795Aug 19, 2024Updated last year
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,667Feb 16, 2025Updated last year
VITA-MLLM / VITA
View on GitHub
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,521Mar 28, 2025Updated last year
CircleRadon / Osprey
View on GitHub
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
☆843Aug 19, 2025Updated 11 months ago
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆198May 1, 2025Updated last year
berkeley-hipie / HIPIE
View on GitHub
[NeurIPS2023] Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"
☆294Jun 19, 2025Updated last year
shikras / d-cube
View on GitHub
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆138Mar 20, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
IDEA-Research / T-Rex
View on GitHub
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
☆2,690Oct 15, 2025Updated 9 months ago
wusize / ovdet
View on GitHub
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
☆187Oct 25, 2023Updated 2 years ago
zhenyuw16 / UniDetector
View on GitHub
Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".
☆588Apr 21, 2023Updated 3 years ago
BradyFU / DVG-Face
View on GitHub
[TPAMI 2021] DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition
☆76Nov 13, 2023Updated 2 years ago
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
bytedance / OmniScient-Model
View on GitHub
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
☆102Jul 15, 2024Updated 2 years ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year