VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.
☆78Dec 5, 2022Updated 3 years ago
Alternatives and similar repositories for videoCC-data
Users that are interested in videoCC-data are comparing it to the libraries listed below
Sorting:
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 3 years ago
- Inverse DALL-E for Optical Character Recognition☆38Oct 14, 2022Updated 3 years ago
- Code release for "Learning Video Representations from Large Language Models"☆536Oct 1, 2023Updated 2 years ago
- Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"☆90Jun 6, 2025Updated 9 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆47Feb 19, 2026Updated 2 weeks ago
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Feb 21, 2022Updated 4 years ago
- Attempt at cog wrapper for a SDXL CLIP Interrogator☆10May 16, 2024Updated last year
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆306Dec 25, 2024Updated last year
- Multi-modality pre-training☆510May 8, 2024Updated last year
- Release of ImageNet-Captions☆51Jan 20, 2023Updated 3 years ago
- ☆54Jul 31, 2022Updated 3 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆61Jun 12, 2023Updated 2 years ago
- ☆53Apr 17, 2022Updated 3 years ago
- This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…☆292Feb 12, 2024Updated 2 years ago
- Channelized Axial Attention for Semantic Segmentation (AAAI-2022)☆31Jul 12, 2022Updated 3 years ago
- ☆17Oct 18, 2022Updated 3 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- DataComp: In search of the next generation of multimodal datasets☆772Apr 28, 2025Updated 10 months ago
- WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique imag…☆1,100Sep 27, 2024Updated last year
- ☆21Nov 24, 2022Updated 3 years ago
- The implement of Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling☆12Aug 19, 2021Updated 4 years ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆36Nov 12, 2022Updated 3 years ago
- ☆16Jul 7, 2023Updated 2 years ago
- New testing protocol for learning local patch descriptors on Brown Phototour dataset☆17Apr 14, 2025Updated 10 months ago
- PANENE: Progressive Approximate NEarest NEighbors☆20Feb 12, 2025Updated last year
- Sapsucker Woods 60 Audiovisual Dataset☆17Oct 7, 2022Updated 3 years ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆953Mar 19, 2025Updated 11 months ago
- ☆110Dec 23, 2022Updated 3 years ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- ☆180Nov 14, 2025Updated 3 months ago
- SOIT: Segmenting Objects with Instance-Aware Transformers☆14Jun 6, 2022Updated 3 years ago
- ☆11Sep 12, 2025Updated 5 months ago
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆48Sep 25, 2023Updated 2 years ago
- HTML5 Canvas Library☆18Apr 29, 2017Updated 8 years ago
- [ICCV 2023] You Only Look at One Partial Sequence☆343Oct 21, 2023Updated 2 years ago
- PyTorch implementation of Omni-DETR for omni-supervised object detection: https://arxiv.org/abs/2203.16089☆69Sep 26, 2022Updated 3 years ago
- CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet☆224Dec 16, 2022Updated 3 years ago
- ☆19Dec 22, 2022Updated 3 years ago
- ☆20Feb 22, 2021Updated 5 years ago