facebookresearch / video-distant-supervisionView external linksLinks
This is an official pytorch implementation of Learning To Recognize Procedural Activities with Distant Supervision. In this repository, we provide PyTorch code for training and testing as described in the paper. The proposed distant supervision framework achieves strong generalization performance on step classification, recognition of procedural…
☆43Feb 21, 2023Updated 2 years ago
Alternatives and similar repositories for video-distant-supervision
Users that are interested in video-distant-supervision are comparing it to the libraries listed below
Sorting:
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆50Jan 27, 2025Updated last year
- Code implementation for our ECCV, 2022 paper titled "My View is the Best View: Procedure Learning from Egocentric Videos"☆34Feb 5, 2024Updated 2 years ago
- [CVPR25] Official Implementation of CAV-MAE Sync☆30Jun 18, 2025Updated 7 months ago
- [ICLR 2024 Poster] SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos☆20Aug 21, 2025Updated 5 months ago
- [CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.☆119Oct 9, 2023Updated 2 years ago
- Official code repository for "Video-Mined Task Graphs for Keystep Recognition in Instructional Videos" arXiv, 2023☆14Apr 1, 2024Updated last year
- [ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing☆27Jul 15, 2022Updated 3 years ago
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆56Aug 8, 2023Updated 2 years ago
- ☆20Aug 19, 2024Updated last year
- MIMIC: Masked Image Modeling with Image Correspondences☆16Jun 14, 2024Updated last year
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆107Jan 23, 2025Updated last year
- Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)☆17Jan 12, 2023Updated 3 years ago
- Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers. Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M…☆44Sep 11, 2024Updated last year
- [ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization☆39Jul 29, 2022Updated 3 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago
- ☆19May 2, 2020Updated 5 years ago
- Code for recreating the HoS benchmark of VISOR☆22Jul 2, 2023Updated 2 years ago
- HT-Step is a large-scale article grounding dataset of temporal step annotations on how-to videos☆24Mar 20, 2024Updated last year
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆103Nov 6, 2024Updated last year
- Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos☆28Sep 9, 2024Updated last year
- GPU-accelerated video decoder☆20May 18, 2021Updated 4 years ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆58Aug 19, 2025Updated 5 months ago
- [ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, M…☆28Jan 28, 2025Updated last year
- The official PyTorch implementation of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) '24 paper PREGO: online mistake detect…☆31Jun 9, 2025Updated 8 months ago
- MERLOT: Multimodal Neural Script Knowledge Models☆225Mar 15, 2022Updated 3 years ago
- ☆107Apr 11, 2022Updated 3 years ago
- [CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos☆101Oct 30, 2022Updated 3 years ago
- Official repository for the MMFM challenge☆25Jun 18, 2024Updated last year
- [ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach☆20Aug 2, 2021Updated 4 years ago
- Code for the VOST dataset☆26Oct 1, 2023Updated 2 years ago
- The MECCANO Dataset: official repository in which we provide code and models.☆32Jul 31, 2023Updated 2 years ago
- ☆29Jun 15, 2022Updated 3 years ago
- SVHF-Net for Cross-modal binary matching☆32Aug 22, 2018Updated 7 years ago
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- Official Implementation of "Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning." ICLR 2026.☆30Feb 3, 2026Updated last week
- Self-Supervised Speech Pre-training and Representation Learning Toolkit.☆10Feb 29, 2024Updated last year
- [KDD 2026 ADS Track] Pytorch implementation of the paper "Hi-Guard: Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasonin…☆19Jan 13, 2026Updated last month
- Code for Learning to Learn Language from Narrated Video☆33Oct 3, 2023Updated 2 years ago
- [AAAI 2022 Oral] This is a Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tail…☆33Feb 17, 2022Updated 3 years ago