liveseongho / Awesome-Video-Language-UnderstandingLinks

A Survey on video and language understanding.

☆50

Alternatives and similar repositories for Awesome-Video-Language-Understanding

Users that are interested in Awesome-Video-Language-Understanding are comparing it to the libraries listed below

Sorting:

klauscc / VindLU
☆108Updated 2 years ago
Nicous20 / FunQA
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …
☆102Updated 7 months ago
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆102Updated 5 months ago
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆186Updated last year
tsujuifu / pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
☆41Updated last year
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆99Updated last year
TXH-mercury / COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Updated 6 months ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆35Updated 11 months ago
ChenDelong1999 / polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
☆64Updated last year
artemisp / LAVIS-XInstructBLIP
LAVIS - A One-stop Library for Language-Vision Intelligence
☆48Updated 11 months ago
RifleZhang / LLaVA-Hound-DPO
☆152Updated 8 months ago
DCDmllm / Momentor
☆76Updated 7 months ago
showlab / cosmo
☆71Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆118Updated 3 months ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
jayleicn / singularity
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆135Updated 2 years ago
imagegridworth / IG-VLM
☆137Updated 9 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆62Updated 3 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆67Updated 10 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
microsoft / LAVENDER
A Unified Framework for Video-Language Understanding
☆57Updated 2 years ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
mlfoundations / VisIT-Bench
☆50Updated last year
kkahatapitiya / LangRepo
Language Repository for Long Video Understanding
☆31Updated last year
MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
thunlp / Muffin
☆65Updated last year
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 2 years ago
zjr2000 / LLMVA-GEBC
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Updated last year
mlvlab / Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆74Updated 3 months ago
X-PLUG / mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
☆227Updated 2 years ago