A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Jul 14, 2021Updated 4 years ago
Alternatives and similar repositories for gpv-1
Users that are interested in gpv-1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆32Mar 7, 2022Updated 4 years ago
- ☆14Dec 25, 2020Updated 5 years ago
- ☆1,046Oct 3, 2022Updated 3 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆117Sep 15, 2022Updated 3 years ago
- PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)☆372Jul 29, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆211Dec 18, 2022Updated 3 years ago
- Referring expression comprehension on ReferIt(RefClef)☆10Nov 28, 2016Updated 9 years ago
- A PyTorch implementation of computing mean average precision in parallel☆16Jul 7, 2022Updated 3 years ago
- Official code of *Towards Event-oriented Long Video Understanding*☆12Jul 26, 2024Updated last year
- ☆11Sep 7, 2020Updated 5 years ago
- ☆39Aug 25, 2022Updated 3 years ago
- project page for VinVL☆359Jul 26, 2023Updated 2 years ago
- Omnivore: A Single Model for Many Visual Modalities☆573Nov 12, 2022Updated 3 years ago
- Extract features and bounding boxes using the original Bottom-up Attention Faster-RCNN in a few lines of Python code☆11Sep 18, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆90Jun 12, 2023Updated 2 years ago
- Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)☆17Jan 12, 2023Updated 3 years ago
- A new framework for open-vocabulary object detection, based on maskrcnn-benchmark☆249Feb 11, 2023Updated 3 years ago
- General-purpose Visual Understanding Evaluation☆20Dec 21, 2023Updated 2 years ago
- A simple wrapper for lmdb. Support dict-like operations.☆23Apr 20, 2023Updated 3 years ago
- [ECCV2022] Rethinking Data Augmentation for Robust Visual Question Answering☆13Nov 23, 2022Updated 3 years ago
- Official code release for paper "Improving Confidence Estimates for Unfamiliar Examples" https://arxiv.org/abs/1804.03166☆12Aug 16, 2020Updated 5 years ago
- ☆35May 2, 2022Updated 4 years ago
- ☆10Jul 23, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICCV 2021] Official code for "Learning to Generate Scene Graph from Natural Language Supervision"☆100Apr 4, 2023Updated 3 years ago
- [ICML 2024] Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations☆15Oct 28, 2023Updated 2 years ago
- ☆231Dec 18, 2023Updated 2 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆55Mar 29, 2023Updated 3 years ago
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 3 years ago
- ☆43Aug 9, 2022Updated 3 years ago
- Use CLIP to represent video for Retrieval Task☆70Mar 1, 2021Updated 5 years ago
- [ACL 2021] mTVR: Multilingual Video Moment Retrieval☆27Aug 20, 2022Updated 3 years ago
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm☆675Sep 19, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The Intermediate Goal of the project is to train a GPT like architecture to learn to summarise reddit posts from human preferences, as th…☆12Jul 14, 2021Updated 4 years ago
- Shows visual grounding methods can be right for the wrong reasons! (ACL 2020)☆23Jun 26, 2020Updated 5 years ago
- ☆289Aug 14, 2025Updated 8 months ago
- [ICLR2022] Code for "Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph"☆54Oct 19, 2022Updated 3 years ago
- ☆87Mar 4, 2024Updated 2 years ago
- Project page for "Visual Grounding in Video for Unsupervised Word Translation" CVPR 2020☆43Apr 26, 2020Updated 6 years ago
- Pytorch implementation for our NeurIPS 2019 paper "TAB-VCR: Tags and Attributes based VCR Baselines" https://arxiv.org/abs/1910.14671☆19May 6, 2021Updated 5 years ago