michelecafagna26 / vinvl-visualbackboneLinks
Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.
☆9Updated 2 years ago
Alternatives and similar repositories for vinvl-visualbackbone
Users that are interested in vinvl-visualbackbone are comparing it to the libraries listed below
Sorting:
- Extract features and bounding boxes using the original Bottom-up Attention Faster-RCNN in a few lines of Python code☆11Updated 2 years ago
- Original VinVL (and Oscar) repo with API designed for an easy inference☆8Updated 2 years ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆19Updated 9 months ago
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆42Updated 3 years ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆55Updated last year
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Updated last year
- Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost☆8Updated last year
- ☆22Updated 11 months ago
- Official implementation of AAAI24 paper "A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking"☆8Updated 9 months ago
- EACL 2023 paper "MLASK: Multimodal Summarization of Video-based News Articles"☆12Updated last year
- Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…☆13Updated last year
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆12Updated last year
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Updated last year
- Code and data for ImageCoDe, a contextual vison-and-language benchmark☆40Updated last year
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆48Updated last year
- Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”☆30Updated 2 years ago
- ☆42Updated last year
- ☆16Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated 9 months ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- ☆11Updated 5 months ago
- Synthesizing realistic and diverse text-datasets from augmented LLMs☆13Updated 3 months ago
- ☆16Updated last year
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆34Updated 2 years ago
- Source code for InBedder, an instruction-following text embedder☆27Updated 9 months ago
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆30Updated 3 years ago
- [EMNLP'23] Code for 'Rethinking Negative Pairs in Code Search'☆12Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated last year
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆23Updated last week