allenai / gpv-1Links
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 3 years ago
Alternatives and similar repositories for gpv-1
Users that are interested in gpv-1 are comparing it to the libraries listed below
Sorting:
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆137Updated 2 years ago
- ☆61Updated 3 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
- Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"☆111Updated 5 years ago
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated 2 years ago
- ☆83Updated 3 years ago
- Command-line tool for downloading and extending the RedCaps dataset.☆48Updated last year
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆134Updated 2 years ago
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆55Updated 2 years ago
- Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time☆45Updated last year
- code release of research paper "Exploring Long-Sequence Masked Autoencoders"☆100Updated 2 years ago
- ☆32Updated 3 years ago
- [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos☆122Updated last year
- PyTorch code for MUST☆107Updated last month
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images☆58Updated 3 years ago
- ☆64Updated last year
- A pytorch Implementation of Open Vocabulary Object Detection with Pseudo Bounding-Box Labels☆61Updated 2 years ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆78Updated 3 years ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆87Updated 2 years ago
- Localized Narratives☆84Updated 3 years ago
- This is an official pytorch implementation of Learning To Recognize Procedural Activities with Distant Supervision. In this repository, w…☆42Updated 2 years ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆130Updated 10 months ago
- Introduction and scripts for the paper "PartImageNet: A Large, High-Quality Dataset of Parts" (Ju He, Shuo Yang, Shaokang Yang, Adam Kort…☆128Updated 3 months ago
- [ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning☆63Updated 2 years ago
- Reliably download millions of images efficiently☆116Updated 4 years ago
- PyTorch implementation of the paper "MILAN: Masked Image Pretraining on Language Assisted Representation" https://arxiv.org/pdf/2208.0604…☆83Updated 2 years ago
- Patching open-vocabulary models by interpolating weights☆91Updated last year
- ☆50Updated 2 years ago
- ☆55Updated 2 years ago
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆78Updated 2 years ago