facebookresearch / stepdiff
Data release for Step Differences in Instructional Video (CVPR24)
☆13Updated 9 months ago
Alternatives and similar repositories for stepdiff:
Users that are interested in stepdiff are comparing it to the libraries listed below
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆55Updated 7 months ago
- This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…☆13Updated last year
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Updated last year
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)☆38Updated 2 weeks ago
- ☆59Updated last year
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated last year
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆37Updated last month
- Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)☆40Updated 11 months ago
- [ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video☆21Updated 8 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆36Updated 3 weeks ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 4 months ago
- ☆23Updated 2 years ago
- ☆33Updated 6 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆25Updated 2 months ago
- HT-Step is a large-scale article grounding dataset of temporal step annotations on how-to videos☆17Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Updated 6 months ago
- Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?☆21Updated 6 months ago
- ☆29Updated 8 months ago
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)☆13Updated 2 years ago
- ☆12Updated 8 months ago
- ☆45Updated this week
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆26Updated 9 months ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆25Updated this week
- The official PyTorch implementation of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) '24 paper PREGO: online mistake detect…☆21Updated last week
- Code for Motion-aware Contrastive Video Representation Learning via Foreground-background Merging (CVPR 2022)☆46Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆26Updated 6 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated last year
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆26Updated 7 months ago
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆30Updated last year
- (CVPR 2023) Official implemention of the paper "Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos…☆29Updated last year