google / video-localized-narratives
☆59Updated last year
Alternatives and similar repositories for video-localized-narratives:
Users that are interested in video-localized-narratives are comparing it to the libraries listed below
- [CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"☆52Updated last year
- ☆48Updated last year
- ☆72Updated 9 months ago
- ☆56Updated 9 months ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆76Updated 2 years ago
- Official repository of paper "Subobject-level Image Tokenization"☆65Updated 9 months ago
- Language Repository for Long Video Understanding☆31Updated 8 months ago
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆32Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 8 months ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆45Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆97Updated 9 months ago
- A Unified Framework for Video-Language Understanding☆56Updated last year
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆92Updated 3 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆89Updated 7 months ago
- [NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆29Updated 2 months ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆118Updated 3 months ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆51Updated last year
- ☆64Updated last year
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated 2 years ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆24Updated 4 months ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆18Updated 2 years ago
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆64Updated 2 years ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated last month
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆59Updated 5 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆40Updated last month
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆54Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆56Updated last year
- Recursive Visual Programming (ECCV 2024)☆17Updated 2 months ago
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆48Updated 3 weeks ago
- ☆83Updated last year