FoundationVision / GLEELinks

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

☆1,165

Alternatives and similar repositories for GLEE

Users that are interested in GLEE are comparing it to the libraries listed below

Sorting:

HarborYuan / ovsam
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
☆1,020Updated 4 months ago
bytedance / Sa2VA
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
☆1,446Updated this week
lxtGH / OMG-Seg
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
☆1,332Updated last month
chongzhou96 / EdgeSAM
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
☆1,085Updated 6 months ago
CircleRadon / Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
☆837Updated 3 months ago
dvlab-research / LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,502Updated 9 months ago
shenyunhang / APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆599Updated last year
MasterBin-IIAU / UNINEXT
[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval
☆1,280Updated 2 years ago
FoundationVision / Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆579Updated last year
IDEA-Research / Grounding-DINO-1.5-API
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆1,061Updated 10 months ago
baaivision / tokenize-anything
[ECCV 2024] Tokenize Anything via Prompting
☆599Updated 11 months ago
jianzongwu / Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆966Updated 8 months ago
ZrrSkywalker / Personalize-SAM
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
☆1,634Updated last year
yoxu515 / aot-benchmark
An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch
☆577Updated last year
IDEA-Research / OpenSeeD
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆740Updated last year
Mark12Ding / SAM2Long
[ICCV 2025] SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
☆533Updated 4 months ago
NVlabs / Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
☆902Updated last month
OpenGVLab / VisionLLM
VisionLLM Series
☆1,128Updated 9 months ago
xinghaochen / TinySAM
[AAAI 2025] Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"
☆527Updated 10 months ago
wanghao9610 / OV-DINO
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
☆381Updated 8 months ago
qqlu / Entity
EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation
☆1,034Updated 2 years ago
UX-Decoder / DINOv
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
☆517Updated last year
yformer / EfficientSAM
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
☆2,445Updated 11 months ago
om-ai-lab / OmDet
Real-time and accurate open-vocabulary end-to-end object detection
☆1,354Updated 11 months ago
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆930Updated 3 months ago
IDEA-Research / DINO-X-API
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
☆1,290Updated 4 months ago
aim-uofa / Matcher
[ICLR'24 & IJCV‘25] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
☆536Updated 11 months ago
z-x-yang / Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary alg…
☆3,080Updated last year
tianrun-chen / SAM-Adapter-PyTorch
Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts
☆1,298Updated last week
SkyworkAI / Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆577Updated last year