Iriya99 / OVRE
☆22Updated 3 months ago
Alternatives and similar repositories for OVRE:
Users that are interested in OVRE are comparing it to the libraries listed below
- EventHallusion: Diagnosing Event Hallucinations in Video LLMs☆30Updated last month
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆37Updated 9 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆63Updated 7 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆49Updated 7 months ago
- ☆26Updated 5 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆18Updated 2 weeks ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆56Updated 5 months ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆29Updated 10 months ago
- [NeurIPS 2024] Lumen: a Large multimodal model with versatile vision-centric capabilities☆24Updated 4 months ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆36Updated 4 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆69Updated 3 weeks ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆62Updated 8 months ago
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆112Updated last month
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆130Updated 10 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆86Updated 2 months ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆29Updated 11 months ago
- ☆65Updated 2 months ago
- ☆38Updated 10 months ago
- [NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding☆47Updated 11 months ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆30Updated last year
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated last year
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆51Updated last month
- An official implementation for MS-DETR in ACL'23☆16Updated last year
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆17Updated 9 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆55Updated 8 months ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆69Updated last month
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆53Updated last month
- This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…☆45Updated 11 months ago