ChangxinWang / BoFiCapLinks

Bounding and Filling: A Fast and Flexible Framework for Image Captioning

☆9

Alternatives and similar repositories for BoFiCap

Users that are interested in BoFiCap are comparing it to the libraries listed below

Sorting:

96-Zachary / vse_2ad
☆16Updated 3 years ago
YuanEZhou / CBTrans
☆22Updated 3 years ago
hsiehjackson / Mr.Right
Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text
☆24Updated 2 years ago
1429904852 / KEF
[COLING 2022] Learning from Adjective-Noun Pairs: A Knowledge-enhanced Framework for Target-Oriented Multimodal Sentiment Classification
☆14Updated 2 years ago
HAWLYQ / InfoMetIC
☆14Updated last year
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆33Updated last year
zhjohnchan / awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
☆59Updated 3 years ago
marslanm / Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…
☆77Updated last month
google-research-datasets / maxm
MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…
☆13Updated last year
zhongwanjun / ProQA
The code for paper "ProQA: Structural Prompt-based Pre-training for Unified Question Answering"
☆11Updated 2 years ago
phellonchen / awesome-visual-dialog
Recent Advances in Visual Dialog
☆30Updated 2 years ago
ovguyo / captions-in-VQA
Using image captions with LLM for zero-shot VQA
☆18Updated last year
limanling / clip-event
☆104Updated 3 years ago
LgQu / TIGeR
Code for paper: Unified Text-to-Image Generation and Retrieval
☆15Updated last year
OpenMatch / UniVL-DR
[ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…
☆51Updated last year
lancopku / IAIS
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
☆31Updated 2 years ago
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
JiwanChung / vlis
☆24Updated last year
njucckevin / KnowCap
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
☆12Updated last year
amazon-science / peft-design-spaces
Official implementation for "Parameter-Efficient Fine-Tuning Design Spaces"
☆27Updated 2 years ago
ChenDelong1999 / polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
☆64Updated last year
frank-chris / ImageTextRetrieval
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Pro…
☆11Updated 3 years ago
YulongBonjour / SimVLM
SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION
☆36Updated 2 years ago
kugwzk / DiDE
Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”
☆30Updated 2 years ago
ZenglaiMa / image-text-matching-paper-reading
Paper reading notes in the field of Image-Text Matching/Retrieval.
☆13Updated 3 years ago
utkarshaditya01 / IR---The-Entertainment-Knowledge-Graph
Information Retrieval project.
☆9Updated 3 years ago
shengyuzhang / DeVLBert
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
☆27Updated 2 years ago
YehLi / TDEN
☆9Updated 2 years ago
LuminosityX / HAT
Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'
☆25Updated last year
HenryHZY / VL-PET
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆53Updated last year