google / video-localized-narrativesView external linksLinks
☆60Aug 10, 2023Updated 2 years ago
Alternatives and similar repositories for video-localized-narratives
Users that are interested in video-localized-narratives are comparing it to the libraries listed below
Sorting:
- ☆16May 10, 2023Updated 2 years ago
- ☆13Jul 20, 2024Updated last year
- [CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?☆35Apr 27, 2023Updated 2 years ago
- ☆18Jul 9, 2024Updated last year
- ☆53Oct 16, 2023Updated 2 years ago
- ☆58Apr 24, 2024Updated last year
- ☆12Mar 10, 2023Updated 2 years ago
- Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning☆14Apr 25, 2024Updated last year
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Jun 18, 2024Updated last year
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- Implementation of Pix2Seq in PyTorch☆10Feb 3, 2022Updated 4 years ago
- This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…☆290Feb 12, 2024Updated 2 years ago
- [ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competition☆14Jan 1, 2024Updated 2 years ago
- SMILE: A Multimodal Dataset for Understanding Laughter☆13Jun 15, 2023Updated 2 years ago
- Official This-Is-My Dataset published in CVPR 2023☆16Jul 18, 2024Updated last year
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- ☆18Jan 30, 2023Updated 3 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- ☆42Jul 9, 2025Updated 7 months ago
- ☆41Sep 25, 2023Updated 2 years ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆70Feb 28, 2024Updated last year
- Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"☆19Sep 29, 2023Updated 2 years ago
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Jan 18, 2024Updated 2 years ago
- Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)☆19Mar 13, 2024Updated last year
- ☆16May 26, 2023Updated 2 years ago
- 【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition☆38Apr 27, 2024Updated last year
- ☆58Dec 2, 2025Updated 2 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- [AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning☆68Feb 16, 2024Updated last year
- ☆242Jun 4, 2025Updated 8 months ago
- ☆42Jan 22, 2024Updated 2 years ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆46Mar 29, 2024Updated last year
- ☆46Jan 11, 2024Updated 2 years ago
- The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"☆16May 3, 2023Updated 2 years ago
- ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)☆18Apr 2, 2025Updated 10 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Jan 17, 2026Updated 3 weeks ago
- ☆85Aug 18, 2024Updated last year
- ☆17Mar 17, 2023Updated 2 years ago
- ☆49Nov 12, 2022Updated 3 years ago