StanfordVL / momaLinks

A dataset for multi-object multi-actor activity parsing

☆41

Alternatives and similar repositories for moma

Users that are interested in moma are comparing it to the libraries listed below

Sorting:

salesforce / paprika
Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"
☆50Updated 10 months ago
EGO4D / episodic-memory
☆130Updated last year
showlab / EgoVLP
[NeurIPS 2022] Egocentric Video-Language Pretraining
☆252Updated last year
OpenGVLab / EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
☆132Updated 6 months ago
JacobYuan7 / RLIPv2
[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
☆135Updated last year
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆100Updated last year
clova-tool / CLOVA-tool
☆30Updated last year
Buzz-Beater / EgoTaskQA
Code for NeurIPS 2022 Datasets and Benchmarks paper - EgoTaskQA: Understanding Human Tasks in Egocentric Videos.
☆35Updated 2 years ago
fpv-iplab / EASG
Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)
☆44Updated 7 months ago
facebookresearch / ProcedureVRL
[CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"
☆55Updated 2 years ago
lbaermann / qaego4d
Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"
☆29Updated 2 years ago
LilyDaytoy / OpenPVSG
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
☆100Updated last year
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆51Updated last year
brown-palm / AntGPT
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
☆25Updated last year
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆103Updated last year
EGO4D / forecasting
☆76Updated last year
leonnnop / VAR
[CVPR 2022] Visual Abductive Reasoning
☆123Updated last year
OpenGVLab / EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
☆74Updated 3 months ago
NVlabs / Bongard-HOI
[CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning
☆72Updated 3 years ago
epic-kitchens / epic-kitchens-100-annotations
Annotations for the public release of the EPIC-KITCHENS-100 dataset
☆158Updated 3 years ago
doc-doc / CoVGT
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
☆19Updated last year
zhaoyue-zephyrus / AVION
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆135Updated 3 months ago
kkahatapitiya / LangRepo
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆32Updated last year
YiwuZhong / SGG_from_NLS
[ICCV 2021] Official code for "Learning to Generate Scene Graph from Natural Language Supervision"
☆101Updated 2 years ago
scwangdyd / zero_shot_hoi
Discovering human interaction with novel objects via zero-shot learning, CVPR, 2020
☆42Updated 5 years ago
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆157Updated 11 months ago
JacobYuan7 / RLIP
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grap…
☆78Updated last year
Kenneth-Wong / MMSceneGraph
ICCV 2021: A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph ge…
☆63Updated 4 years ago
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆189Updated last year
IDEA-Research / DiffHOI
Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"
☆64Updated 2 years ago