kkahatapitiya / LangRepoLinks

Language Repository for Long Video Understanding

☆32

Alternatives and similar repositories for LangRepo

Users that are interested in LangRepo are comparing it to the libraries listed below

Sorting:

orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆66Updated last year
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆100Updated 9 months ago
bigai-nlco / VideoLLaMB
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆71Updated 5 months ago
imagegridworth / IG-VLM
☆138Updated 10 months ago
kahnchana / mvu
🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)
☆45Updated 6 months ago
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆120Updated 4 months ago
egoschema / EgoSchema
☆94Updated 7 months ago
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆99Updated last year
mlvlab / Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆76Updated 4 months ago
ChenYi99 / EgoPlan
☆71Updated 8 months ago
lbaermann / qaego4d
Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"
☆27Updated last year
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆44Updated last year
wxh1996 / VideoAgent
☆105Updated 3 months ago
amazon-science / QA-ViT
☆69Updated last year
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆105Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆86Updated last year
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆72Updated 10 months ago
agentic-learning-ai-lab / lifelong-memory
Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
☆24Updated 7 months ago
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆131Updated last month
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Updated 3 weeks ago
Hritikbansal / videocon
☆58Updated last year
allenai / unified-io-2.pytorch
☆76Updated last year
RifleZhang / LLaVA-Hound-DPO
☆152Updated 9 months ago
alanaai / EVUD
Egocentric Video Understanding Dataset (EVUD)
☆30Updated last year
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆104Updated last year
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆36Updated 4 months ago
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated last month
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆40Updated 4 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆33Updated 8 months ago