michelecafagna26 / vinvl-visualbackboneLinks
Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.
☆9Updated 2 years ago
Alternatives and similar repositories for vinvl-visualbackbone
Users that are interested in vinvl-visualbackbone are comparing it to the libraries listed below
Sorting:
- Extract features and bounding boxes using the original Bottom-up Attention Faster-RCNN in a few lines of Python code☆11Updated 2 years ago
- Original VinVL (and Oscar) repo with API designed for an easy inference☆8Updated 2 years ago
- text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)☆11Updated 9 months ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆55Updated last year
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆19Updated 10 months ago
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆42Updated 3 years ago
- ☆10Updated last year
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆35Updated 2 years ago
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Updated last year
- Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost☆8Updated last year
- Official implementation of AAAI24 paper "A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking"☆8Updated 10 months ago
- Mitigating Open-Vocabulary Caption Hallucinations (EMNLP 2024)☆17Updated 9 months ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 4 years ago
- Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"☆66Updated 3 years ago
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated 9 months ago
- ☆16Updated last year
- Sapsucker Woods 60 Audiovisual Dataset☆15Updated 2 years ago
- ☆26Updated 3 years ago
- Mixture of Attention Heads☆48Updated 2 years ago
- Official Repo for FoodieQA paper (EMNLP 2024)☆16Updated last month
- Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…☆32Updated 3 years ago
- ☆22Updated 2 years ago
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆12Updated last year
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆23Updated 4 months ago
- ☆42Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated 2 years ago
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year
- opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.☆11Updated 4 years ago
- Multimodal Graph Network (MGN): Code repo, examples from the paper☆25Updated 4 years ago
- Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming …☆13Updated last year