ninatu / howtocaptionLinks

Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024

☆55

Alternatives and similar repositories for howtocaption

Users that are interested in howtocaption are comparing it to the libraries listed below

Sorting:

facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆100Updated last year
facebookresearch / htstep
HT-Step is a large-scale article grounding dataset of temporal step annotations on how-to videos
☆22Updated last year
showlab / cosmo
☆73Updated last year
zhaoyue-zephyrus / AVION
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆136Updated 3 months ago
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆107Updated 10 months ago
danielchyeh / this-is-my
Official This-Is-My Dataset published in CVPR 2023
☆16Updated last year
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆104Updated last year
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆190Updated last year
Lzq5 / Video-Text-Alignment
☆26Updated 4 months ago
DCDmllm / Momentor
☆80Updated last year
zjr2000 / LLMVA-GEBC
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆30Updated last year
sudo-Boris / mr-Blip
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
☆92Updated 8 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 7 months ago
egoschema / EgoSchema
☆104Updated 11 months ago
lucas-ventura / CoVR
Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".
☆118Updated last month
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
TAU-VAILab / hierarcaps
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆32Updated last year
jinhyunj / EaTR
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆53Updated 2 years ago
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
dhg-wei / TOPA
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆30Updated last year
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Updated 2 years ago
imagegridworth / IG-VLM
☆140Updated last year
facebookresearch / ego4d-goalstep
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
☆52Updated last year
klauscc / VindLU
☆110Updated 2 years ago
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆83Updated last year
StanfordVL / atp-video-language
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆50Updated last year
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆146Updated 5 months ago
CeeZh / SILVR
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Updated 3 months ago
dmoltisanti / air-cvpr23
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13Updated 2 years ago
Chuhanxx / helping_hand_for_egocentric_videos
Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'
☆33Updated 2 years ago