iLearn-Lab/MM23-RTQ

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/iLearn-Lab/MM23-RTQ)

iLearn-Lab / MM23-RTQ

ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model

☆15

Alternatives and similar repositories for MM23-RTQ

Users that are interested in MM23-RTQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / ego-env
View on GitHub
Human-centric environment representations from egocentric video
☆15Feb 5, 2026Updated 5 months ago
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
AIM-SKKU / QA-TIGER
View on GitHub
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆29Jun 6, 2025Updated last year
tomchen-ctj / CVPR23-LOVEU-AQTC
View on GitHub
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
☆15Jul 18, 2023Updated 3 years ago
OmkarThawakar / composed-video-retrieval
View on GitHub
Composed Video Retrieval
☆62May 2, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
alibaba-mmai-research / DiST
View on GitHub
ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
☆41Sep 25, 2023Updated 2 years ago
HuiGuanLab / UmURL
View on GitHub
This is a repository contains the implementation of our ACM MM 2023 paper Unified Multi-modal Unsupervised Representation Learning for Sk…
☆14Nov 9, 2023Updated 2 years ago
tomchen-ctj / OST
View on GitHub
【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
☆39Apr 27, 2024Updated 2 years ago
minjoong507 / BM-DETR
View on GitHub
[WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"
☆16Feb 24, 2025Updated last year
LunarShen / DsicoVLA
View on GitHub
[CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
☆22Jun 23, 2025Updated last year
Visual-AI / FROSTER
View on GitHub
[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition
☆101Jan 14, 2025Updated last year
jpthu17 / DiCoSA
View on GitHub
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
☆53Apr 9, 2024Updated 2 years ago
OpenGVLab / VRBench
View on GitHub
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
☆28Jun 4, 2026Updated last month
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SAGNIKMJR / ego-AV-spatial-correspondence
View on GitHub
[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'
☆14Jun 16, 2024Updated 2 years ago
RenShuhuai-Andy / TESTA
View on GitHub
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆50Jan 9, 2024Updated 2 years ago
xiaoneil / LPNet
View on GitHub
☆13Nov 28, 2021Updated 4 years ago
foolwood / DRL
View on GitHub
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
☆97Apr 7, 2022Updated 4 years ago
YYJMJC / Compositional-Temporal-Grounding
View on GitHub
☆31Mar 24, 2022Updated 4 years ago
SimonCherryGZ / TensorFlow_Android
View on GitHub
基于TensorFlow官方Android端示例，对选择的图片进行风格迁移
☆11Apr 9, 2017Updated 9 years ago
stogiannidis / srbench
View on GitHub
Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"
☆19Feb 1, 2026Updated 5 months ago
MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
karchkha / MelSpec_GPT_VQVAE
View on GitHub
Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms
☆18Oct 8, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
BolinLai / CSTS
View on GitHub
[ECCV2024] The official implementation of "Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation".
☆16Feb 24, 2025Updated last year
gyx-gloria / DMT
View on GitHub
Official Implementation of DMT: Dual Mean-Teacher in PyTorch.
☆10Oct 27, 2023Updated 2 years ago
DanDoge / Palm
View on GitHub
team Doggeee's solution to Ego4D LTA challenge@CVPRW23'
☆14Nov 4, 2023Updated 2 years ago
Ziyang412 / UCoFiA
View on GitHub
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆66Jun 7, 2024Updated 2 years ago
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 3 years ago
ExplainableML / EgoCVR
View on GitHub
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Apr 11, 2025Updated last year
srijandas07 / clip_baseline_LTA_Ego4d
View on GitHub
Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)
☆15Jul 4, 2022Updated 4 years ago
Minglu58 / TA2V
View on GitHub
☆15Dec 1, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
2282588541a / HiRAG
View on GitHub
code for paper Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering
☆14Aug 13, 2024Updated last year
tonychenxyz / vit-interpret
View on GitHub
Official implementation of "Interpreting and Controlling Vision Foundation Models via Text Explanations"
☆14May 29, 2024Updated 2 years ago
dengandong / Videos-Publications-Collection
View on GitHub
This is a collection of publications about videos.
☆18Apr 29, 2021Updated 5 years ago
lntzm / MESM
View on GitHub
The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)
☆32Mar 29, 2024Updated 2 years ago
yl3800 / EIGV
View on GitHub
☆15Aug 12, 2022Updated 3 years ago
AmeenAli / VideoMatch
View on GitHub
☆14Jan 5, 2022Updated 4 years ago
MichWozPol / LEGO_StableDiffusion
View on GitHub
The project aim was to fine-tune the stable diffusion model in order to generate images in the LEGO style based on the prompt.
☆16Jun 7, 2023Updated 3 years ago