nguyentthong / video-language-understandingLinks

[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

☆45

Alternatives and similar repositories for video-language-understanding

Users that are interested in video-language-understanding are comparing it to the libraries listed below

Sorting:

chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆91Updated last year
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
yellow-binary-tree / HawkEye
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆44Updated last year
houzhijian / CONE
[2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
☆31Updated 2 years ago
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆83Updated last year
doc-doc / NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆175Updated 4 months ago
kkzhang95 / Awesome-Composed-Multi-modal-Retrieval
A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…
☆71Updated 3 months ago
DCDmllm / Momentor
☆80Updated last year
icq-benchmark / icq-benchmark
☆20Updated 4 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆138Updated 3 months ago
jpthu17 / EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
☆141Updated last year
jpthu17 / HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
☆122Updated 11 months ago
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
uvavision / SelfEQ
[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".
☆28Updated last year
jinhyunj / EaTR
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆53Updated 2 years ago
THUNLP-MT / Brote
☆11Updated 10 months ago
yeliudev / R2-Tuning
🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
☆90Updated last year
VRU-NExT / VideoQA
☆99Updated 3 years ago
haokunwen / DQU-CIR
[SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval
☆43Updated last year
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆104Updated last year
huangmozhi9527 / GMMFormer
[AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
☆20Updated last year
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
Pter61 / context-i2w
Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]
☆55Updated 6 months ago
zhengrongz / AoTD
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
☆52Updated 6 months ago
tmlr-group / WCA
[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"
☆57Updated last year
Cuberick-Orion / Candidate-Reranking-CIR
The official implementation for Candidate Set Re-ranking for Composed Image Retrieval (TMLR) 01/2024
☆20Updated last year
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆63Updated last year
mengcaopku / LocVTP
[ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization
☆39Updated 3 years ago
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 5 months ago
afcedf / SOONet
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
☆26Updated last year