appletea233 / AL-Ref-SAM2
AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
☆72Updated last month
Alternatives and similar repositories for AL-Ref-SAM2:
Users that are interested in AL-Ref-SAM2 are comparing it to the libraries listed below
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆75Updated 7 months ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆63Updated 7 months ago
- ☆79Updated 11 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models☆112Updated 4 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆95Updated last month
- ☆37Updated 4 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆153Updated 5 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆65Updated 3 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆78Updated 2 months ago
- Official PyTorch implementation of GeoDiffusion in ICLR 2024 (https://arxiv.org/abs/2306.04607)☆74Updated 2 weeks ago
- ☆58Updated last year
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆181Updated 11 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆92Updated 6 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆81Updated 4 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆46Updated 6 months ago
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆92Updated last month
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆20Updated 6 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆37Updated 3 weeks ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆50Updated last week
- state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆29Updated 9 months ago
- Video Reasoning Segmentation☆19Updated 2 months ago
- This is a PyTorch implementation of MCLN proposed by our paper "Multi-branch Collaborative Learning Network for 3D Visual Grounding"(ECCV…☆14Updated 3 months ago
- Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆20Updated 3 weeks ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆67Updated 4 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆85Updated 2 weeks ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆46Updated 2 weeks ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated 2 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆66Updated 3 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆28Updated 2 weeks ago
- Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆23Updated 2 weeks ago