VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.
☆78Dec 5, 2022Updated 3 years ago
Alternatives and similar repositories for videoCC-data
Users that are interested in videoCC-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆54Jul 31, 2022Updated 3 years ago
- Release of ImageNet-Captions☆51Jan 20, 2023Updated 3 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆159Dec 9, 2024Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆61Jun 12, 2023Updated 2 years ago
- Code release for "Learning Video Representations from Large Language Models"☆533Oct 1, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆47Feb 19, 2026Updated last month
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 3 years ago
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆308Dec 25, 2024Updated last year
- Multi-modality pre-training☆510Mar 27, 2026Updated 2 weeks ago
- Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"☆95Jun 6, 2025Updated 10 months ago
- Inverse DALL-E for Optical Character Recognition☆38Oct 14, 2022Updated 3 years ago
- Let's make a video clip☆97Jul 29, 2022Updated 3 years ago
- Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time☆46Jun 11, 2024Updated last year
- ☆19Dec 22, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Feb 21, 2022Updated 4 years ago
- ☆180Nov 14, 2025Updated 5 months ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆36Nov 12, 2022Updated 3 years ago
- Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]☆139Updated this week
- The implement of Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling☆12Aug 19, 2021Updated 4 years ago
- DataComp: In search of the next generation of multimodal datasets☆774Apr 28, 2025Updated 11 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆954Mar 19, 2025Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique imag…☆1,102Sep 27, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Easily create large video dataset from video urls☆653Jul 30, 2024Updated last year
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆47Sep 25, 2023Updated 2 years ago
- ☆13Jul 20, 2024Updated last year
- Attempt at cog wrapper for a SDXL CLIP Interrogator☆10May 16, 2024Updated last year
- Sapsucker Woods 60 Audiovisual Dataset☆18Oct 7, 2022Updated 3 years ago
- ☆32May 3, 2024Updated last year
- This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…☆294Feb 12, 2024Updated 2 years ago
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆120Oct 9, 2025Updated 6 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆23Nov 8, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Source code for the paper 'Audio Captioning Transformer'☆56Jan 18, 2022Updated 4 years ago
- ☆58Apr 24, 2024Updated last year
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Jul 14, 2021Updated 4 years ago
- ☆110Dec 23, 2022Updated 3 years ago
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆77Jan 18, 2023Updated 3 years ago
- [ICCV 2023] You Only Look at One Partial Sequence☆343Oct 21, 2023Updated 2 years ago
- [CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》☆151Jun 7, 2023Updated 2 years ago