SCZwangxiao/RTQ-MM2023

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SCZwangxiao/RTQ-MM2023)

SCZwangxiao / RTQ-MM2023

ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model

☆16

Alternatives and similar repositories for RTQ-MM2023

Users that are interested in RTQ-MM2023 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / ego-env
View on GitHub
Human-centric environment representations from egocentric video
☆14Feb 5, 2026Updated last month
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Jan 1, 2026Updated 2 months ago
AIM-SKKU / QA-TIGER
View on GitHub
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆28Jun 6, 2025Updated 9 months ago
tomchen-ctj / CVPR23-LOVEU-AQTC
View on GitHub
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
☆15Jul 18, 2023Updated 2 years ago
alibaba-mmai-research / DiST
View on GitHub
ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
☆41Sep 25, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
knightyxp / DGL
View on GitHub
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
☆47Oct 14, 2024Updated last year
minjoong507 / BM-DETR
View on GitHub
[WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"
☆16Feb 24, 2025Updated last year
Akshit21112002 / TTRV
View on GitHub
TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)
☆37Mar 8, 2026Updated 3 weeks ago
tomchen-ctj / OST
View on GitHub
【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
☆37Apr 27, 2024Updated last year
jpthu17 / DiCoSA
View on GitHub
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
☆53Apr 9, 2024Updated last year
Visual-AI / FROSTER
View on GitHub
[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition
☆97Jan 14, 2025Updated last year
MGitHubL / TMac
View on GitHub
☆13Feb 26, 2024Updated 2 years ago
Heven-Pan / UFVideo
View on GitHub
[CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
☆37Feb 21, 2026Updated last month
SAGNIKMJR / ego-AV-spatial-correspondence
View on GitHub
[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'
☆13Jun 16, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
zsgvivo / VideoZoomer
View on GitHub
☆28Feb 12, 2026Updated last month
YYJMJC / Compositional-Temporal-Grounding
View on GitHub
☆31Mar 24, 2022Updated 4 years ago
xiaoneil / LPNet
View on GitHub
☆13Nov 28, 2021Updated 4 years ago
foolwood / DRL
View on GitHub
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
☆98Apr 7, 2022Updated 3 years ago
yl3800 / EIGV
View on GitHub
☆15Aug 12, 2022Updated 3 years ago
stogiannidis / srbench
View on GitHub
Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"
☆19Feb 1, 2026Updated last month
EnVision-Research / A4-Agent
View on GitHub
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
☆37Mar 12, 2026Updated 2 weeks ago
karchkha / MelSpec_GPT_VQVAE
View on GitHub
Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms
☆18Oct 8, 2023Updated 2 years ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆44Mar 11, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
BolinLai / CSTS
View on GitHub
[ECCV2024] The official implementation of "Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation".
☆13Feb 24, 2025Updated last year
GeWu-Lab / TSPM
View on GitHub
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆16Oct 25, 2024Updated last year
iSEE-Laboratory / Long_RVOS
View on GitHub
(CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation
☆31Feb 28, 2026Updated last month
Ziyang412 / UCoFiA
View on GitHub
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆66Jun 7, 2024Updated last year
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 2 years ago
ExplainableML / EgoCVR
View on GitHub
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Apr 11, 2025Updated 11 months ago
deep-spin / Infinite-Video
View on GitHub
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
☆20Feb 14, 2025Updated last year
lntzm / MESM
View on GitHub
The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)
☆32Mar 29, 2024Updated 2 years ago
srijandas07 / clip_baseline_LTA_Ego4d
View on GitHub
Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)
☆15Jul 4, 2022Updated 3 years ago
Wordpress hosting with auto-scaling on Cloudways • Ad
Fully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
chakravarthi589 / Video-Question-Answering_Resources
View on GitHub
Video Question Answering | Video QA | VQA
☆92Nov 17, 2025Updated 4 months ago
tonychenxyz / vit-interpret
View on GitHub
Official implementation of "Interpreting and Controlling Vision Foundation Models via Text Explanations"
☆14May 29, 2024Updated last year
AmeenAli / VideoMatch
View on GitHub
☆14Jan 5, 2022Updated 4 years ago
WHB139426 / GCG
View on GitHub
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
☆10Jul 22, 2024Updated last year
GeWu-Lab / PSTP-Net
View on GitHub
☆17Aug 11, 2023Updated 2 years ago
zjr2000 / GVL
View on GitHub
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
☆28Dec 8, 2023Updated 2 years ago
EverM0re / LiCoMemory
View on GitHub
[arXiv'25] LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning
☆35Jan 6, 2026Updated 2 months ago