facebookresearch / htstepLinks

HT-Step is a large-scale article grounding dataset of temporal step annotations on how-to videos

☆20

Alternatives and similar repositories for htstep

Users that are interested in htstep are comparing it to the libraries listed below

Sorting:

facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆99Updated last year
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆44Updated last year
ninatu / howtocaption
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
☆55Updated 10 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆39Updated 3 months ago
benedettaliberatori / T3AL
Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024
☆64Updated 10 months ago
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆77Updated last year
lucas-ventura / CoVR
Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".
☆109Updated 4 months ago
dmoltisanti / air-cvpr23
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13Updated 2 years ago
zhaoyue-zephyrus / AVION
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆133Updated last year
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆61Updated 10 months ago
wlin-at / MAXI
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)
☆31Updated last year
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆100Updated 9 months ago
fmu2 / snag_release
Official Implementation of SnAG (CVPR 2024)
☆51Updated 3 months ago
StanfordVL / atp-video-language
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆51Updated last year
qirui-chen / MultiHop-EgoQA
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆26Updated 2 months ago
TAU-VAILab / hierarcaps
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆29Updated 11 months ago
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆157Updated 8 months ago
alibaba-mmai-research / DiST
ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
☆41Updated last year
OpenGVLab / EgoVideo
[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024
☆127Updated 2 months ago
houzhijian / GroundNLQ
The champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023
☆17Updated last year
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆102Updated 6 months ago
aleflabo / PREGO
The official PyTorch implementation of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) '24 paper PREGO: online mistake detect…
☆24Updated last month
dhg-wei / TOPA
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆31Updated 10 months ago
rohithpeddi / SceneSayer
[ECCV 2024 (Oral)] Towards Scene Graph Anticipation
☆17Updated 8 months ago
Jazzcharles / Egoinstructor
Pytorch implementation for Egoinstructor at CVPR 2024
☆23Updated 8 months ago
jinhyunj / EaTR
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆50Updated last year
Chuhanxx / helping_hand_for_egocentric_videos
Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'
☆33Updated last year
HJYao00 / Side4Video
☆40Updated last year
zihuixue / AlignEgoExo
Code and data release for the paper "Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Align…
☆18Updated last year
Lzq5 / Video-Text-Alignment
☆25Updated 3 weeks ago