michelecafagna26 / faster-rcnn-bottom-up-pyLinks

Extract features and bounding boxes using the original Bottom-up Attention Faster-RCNN in a few lines of Python code

☆11

Alternatives and similar repositories for faster-rcnn-bottom-up-py

Users that are interested in faster-rcnn-bottom-up-py are comparing it to the libraries listed below

Sorting:

michelecafagna26 / vinvl-visualbackbone
Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.
☆9Updated 2 years ago
michelecafagna26 / VinVL
Original VinVL (and Oscar) repo with API designed for an easy inference
☆8Updated 2 years ago
woojeongjin / FewVLM
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
☆42Updated 3 years ago
zhiyuanhubj / Long_form_VideoQA
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆19Updated 10 months ago
jacobswan1 / ViTCAP
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
☆43Updated 3 years ago
AndersonStra / MuKEA
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
☆96Updated 2 years ago
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆66Updated 3 years ago
layer6ai-labs / SGG-Seq2Seq
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"
☆43Updated 3 years ago
iOPENCap / awesome-unimodal-training
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
☆11Updated 9 months ago
aditya10 / VLC-BERT
Code for WACV 2023 paper "VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge"
☆21Updated 2 years ago
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆62Updated last week
facebookresearch / reliable_vqa
Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…
☆35Updated 2 years ago
MrZilinXiao / AutoVER
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Updated last year
limanling / clip-event
☆104Updated 3 years ago
YuJungHeo / kbvqa-public
☆39Updated 2 years ago
McGill-NLP / imagecode
Code and data for ImageCoDe, a contextual vison-and-language benchmark
☆40Updated last year
gicheonkang / gst-visdial
Official PyTorch Implementation for CVPR'23 Paper, "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"
☆20Updated last year
buxiangzhiren / DDCap
☆84Updated 2 years ago
phellonchen / awesome-visual-dialog
Recent Advances in Visual Dialog
☆30Updated 2 years ago
YulongBonjour / SimVLM
SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION
☆36Updated 2 years ago
sushizixin / CLIP4IDC
CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)
☆34Updated 2 years ago
guoyang9 / UnifER
Official implementation for the MM'22 paper.
☆13Updated 3 years ago
sail-sg / VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
☆48Updated 2 years ago
pleaseconnectwifi / DANCE
PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)
☆23Updated 2 years ago
ZihaoW123 / UniMM
Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"
☆13Updated 2 years ago
fawazsammani / nlxgpt
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)
☆48Updated last year
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆59Updated 2 years ago
jianjieluo / SCD-Net
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆64Updated last year
Heidelberg-NLP / VALSE
Data repository for the VALSE benchmark.
☆37Updated last year
HAWLYQ / InfoMetIC
☆14Updated last year