VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.
☆78Dec 5, 2022Updated 3 years ago
Alternatives and similar repositories for videoCC-data
Users that are interested in videoCC-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆54Jul 31, 2022Updated 3 years ago
- Release of ImageNet-Captions☆51Jan 20, 2023Updated 3 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆159Dec 9, 2024Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆61Jun 12, 2023Updated 3 years ago
- Code release for "Learning Video Representations from Large Language Models"☆534Oct 1, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code used to create the ARCA23K and ARCA23K-FSD datasets☆16Nov 9, 2021Updated 4 years ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆47Feb 19, 2026Updated 3 months ago
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 3 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- LL3M: Large Language and Multi-Modal Model in Jax☆74Apr 23, 2024Updated 2 years ago
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆311Dec 25, 2024Updated last year
- Multi-modality pre-training☆512Mar 27, 2026Updated 2 months ago
- Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"☆98Jun 6, 2025Updated last year
- Inverse DALL-E for Optical Character Recognition☆38Oct 14, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Let's make a video clip☆97Jul 29, 2022Updated 3 years ago
- ☆19Dec 22, 2022Updated 3 years ago
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Feb 21, 2022Updated 4 years ago
- ☆180Nov 14, 2025Updated 7 months ago
- Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]☆139Apr 10, 2026Updated 2 months ago
- The implement of Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling☆12Aug 19, 2021Updated 4 years ago
- DataComp: In search of the next generation of multimodal datasets☆782Apr 28, 2025Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆953Mar 19, 2025Updated last year
- WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique imag…☆1,108Sep 27, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ImageNet3D: Towards General-Purpose Object-Level 3D Understanding☆21Dec 6, 2024Updated last year
- Easily create large video dataset from video urls☆659Jul 30, 2024Updated last year
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆47Sep 25, 2023Updated 2 years ago
- ☆13Jul 20, 2024Updated last year
- Implementation of <Symbolic Graphics Programming with Large Language Models>☆38Sep 14, 2025Updated 9 months ago
- Attempt at cog wrapper for a SDXL CLIP Interrogator☆10May 16, 2024Updated 2 years ago
- ☆32May 3, 2024Updated 2 years ago
- This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…☆298Feb 12, 2024Updated 2 years ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆23Nov 8, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆58Apr 24, 2024Updated 2 years ago
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Jul 14, 2021Updated 4 years ago
- ☆109Dec 23, 2022Updated 3 years ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.☆356Jul 22, 2025Updated 10 months ago
- SVIT: Scaling up Visual Instruction Tuning☆168Jun 20, 2024Updated last year
- Awesome Self-Supervised Vision Learning☆11Mar 27, 2024Updated 2 years ago
- CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet☆222Dec 16, 2022Updated 3 years ago