ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
☆16Jan 31, 2024Updated 2 years ago
Alternatives and similar repositories for RTQ-MM2023
Users that are interested in RTQ-MM2023 are comparing it to the libraries listed below
Sorting:
- Human-centric environment representations from egocentric video☆14Feb 5, 2026Updated last month
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 2 months ago
- Composed Video Retrieval☆62May 2, 2024Updated last year
- [WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"☆16Feb 24, 2025Updated last year
- 【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition☆38Apr 27, 2024Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Sep 25, 2023Updated 2 years ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆47Oct 14, 2024Updated last year
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆26Jun 6, 2025Updated 9 months ago
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆97Jan 14, 2025Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Jan 9, 2024Updated 2 years ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆53Apr 9, 2024Updated last year
- SUPERVAIZER is a toolkit built for the age of AI interoperability. At its core, it implements Google's Agent-to-Agent (A2A) protocol, ena…☆14Feb 4, 2026Updated last month
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆31Mar 24, 2022Updated 3 years ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆46Jun 19, 2025Updated 8 months ago
- ☆12Sep 19, 2022Updated 3 years ago
- This project is an AI Recruitment System designed to accelerate the hiring process for HR and technical recruiters.☆14Jan 3, 2025Updated last year
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆19Jul 10, 2025Updated 8 months ago
- Self hosted AI workflow for scraping Instagram Reels (audio and description). Extracting, summarising and categorising, then storing all …☆28Jan 10, 2026Updated last month
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆36Feb 21, 2026Updated 2 weeks ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆41Apr 11, 2025Updated 10 months ago
- An AI-powered tool that translates plain English commands into multi-step API workflows, automating the entire testing process.☆17Jul 27, 2025Updated 7 months ago
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆101Oct 29, 2025Updated 4 months ago
- ☆42Apr 7, 2024Updated last year
- ☆16Apr 28, 2023Updated 2 years ago
- Code for the WWW '19 paper "Event Detection using Hierarchical Multi-Aspect Attention"☆10Oct 12, 2020Updated 5 years ago
- ☆15Dec 2, 2025Updated 3 months ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- An official pytorch implementation of the paper: [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval].☆14Jul 27, 2024Updated last year
- Third place of 2021 IEEE GRSS Data Fusion Contest: Track MSD☆10Mar 31, 2021Updated 4 years ago
- 基于触发词的燃气事件抽取,包括:时间、地点、原因、后果、组织等实体信息☆10Apr 13, 2021Updated 4 years ago
- Agentic framework combining the power of LLMs with domain-specific tools for materials science, enabling property extraction, simulations…☆12May 1, 2025Updated 10 months ago
- Code related to the paper "MobileMEF: Fast and Efficient Method for Multi-Exposure Fusion"☆12Dec 14, 2024Updated last year
- Official implementation of "In-style: Bridging Text and Uncurated Videos with Style Transfer for Cross-modal Retrieval." ICCV 2023☆11Oct 5, 2023Updated 2 years ago
- [CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'☆13Jun 16, 2024Updated last year
- Dataset: UET Driver Activity Recognition☆10Apr 19, 2022Updated 3 years ago
- 开发成长路上☆10Dec 25, 2018Updated 7 years ago
- Explore from keyword search to dense retrieval and reranking, which injects the intelligence of LLMs into your search system, making it f…☆14Aug 27, 2023Updated 2 years ago