SCZwangxiao / RTQ-MM2023View external linksLinks
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
☆16Jan 31, 2024Updated 2 years ago
Alternatives and similar repositories for RTQ-MM2023
Users that are interested in RTQ-MM2023 are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆26Jan 1, 2026Updated last month
- Composed Video Retrieval☆62May 2, 2024Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- [WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"☆16Feb 24, 2025Updated 11 months ago
- 【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition☆38Apr 27, 2024Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Sep 25, 2023Updated 2 years ago
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆97Jan 14, 2025Updated last year
- SUPERVAIZER is a toolkit built for the age of AI interoperability. At its core, it implements Google's Agent-to-Agent (A2A) protocol, ena…☆14Feb 4, 2026Updated last week
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Jan 9, 2024Updated 2 years ago
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆31Mar 24, 2022Updated 3 years ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- Self hosted AI workflow for scraping Instagram Reels (audio and description). Extracting, summarising and categorising, then storing all …☆27Jan 10, 2026Updated last month
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆41Apr 11, 2025Updated 10 months ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Jun 23, 2025Updated 7 months ago
- ☆11Dec 6, 2024Updated last year
- UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆35Dec 29, 2025Updated last month
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆100Oct 29, 2025Updated 3 months ago
- Weakly Supervised Referring Video Object Segmentation with Object-Centric Pseudo-Guidance☆10Aug 17, 2024Updated last year
- ☆13Nov 28, 2021Updated 4 years ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆65Jul 8, 2025Updated 7 months ago
- Code for the WWW '19 paper "Event Detection using Hierarchical Multi-Aspect Attention"☆10Oct 12, 2020Updated 5 years ago
- Agentic translation using reflection workflow, refactored and sugared.☆11Sep 25, 2024Updated last year
- Official Implementation of DMT: Dual Mean-Teacher in PyTorch.☆10Oct 27, 2023Updated 2 years ago
- Third place of 2021 IEEE GRSS Data Fusion Contest: Track MSD☆10Mar 31, 2021Updated 4 years ago
- Code related to the paper "MobileMEF: Fast and Efficient Method for Multi-Exposure Fusion"☆12Dec 14, 2024Updated last year
- ☆14Jan 5, 2022Updated 4 years ago
- Agentic framework combining the power of LLMs with domain-specific tools for materials science, enabling property extraction, simulations…☆11May 1, 2025Updated 9 months ago
- 开发成长路上☆10Dec 25, 2018Updated 7 years ago
- 基于触发词的燃气事件抽取,包括:时间、地点、原因、后果、组织等实体信息☆10Apr 13, 2021Updated 4 years ago
- In this project, facial recognition algorithm is implemented with python using PCA and SVD dimensionality reduction tools.☆10Sep 2, 2019Updated 6 years ago
- Official implementation of "In-style: Bridging Text and Uncurated Videos with Style Transfer for Cross-modal Retrieval." ICCV 2023☆11Oct 5, 2023Updated 2 years ago
- Explore from keyword search to dense retrieval and reranking, which injects the intelligence of LLMs into your search system, making it f…☆14Aug 27, 2023Updated 2 years ago
- Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset☆13Nov 19, 2022Updated 3 years ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- Empowering Small VLMs to Think with Dynamic Memorization and Exploration☆15Nov 18, 2025Updated 2 months ago
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆34Jul 3, 2025Updated 7 months ago