zhouyiks / CoLVA
β23Updated last month
Alternatives and similar repositories for CoLVA:
Users that are interested in CoLVA are comparing it to the libraries listed below
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".β20Updated 4 months ago
- β38Updated 4 months ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β32Updated 8 months ago
- Open implementation of "RandAR"β53Updated last month
- state-of-the-art open vocabulary detector on COCO/LVIS/V3Detβ29Updated 10 months ago
- ReNeg: Learning Negative Embedding with Reward Guidanceβ27Updated last month
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentationβ46Updated 7 months ago
- β58Updated last year
- β58Updated last year
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ103Updated last month
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Modelβ93Updated 7 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ62Updated 6 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β66Updated 3 weeks ago
- PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ25Updated 2 months ago
- β17Updated last month
- Large-Vocabulary Video Instance Segmentation datasetβ78Updated 7 months ago
- [ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learningβ¦β25Updated 6 months ago
- β27Updated last year
- β28Updated 4 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ29Updated 3 months ago
- β16Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ66Updated 3 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β33Updated 2 months ago
- β16Updated last year
- Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learningβ30Updated last year
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Modelβ27Updated 2 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)β41Updated 3 months ago