ChangxinWang / BoFiCap
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
☆9Updated 11 months ago
Alternatives and similar repositories for BoFiCap:
Users that are interested in BoFiCap are comparing it to the libraries listed below
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…☆34Updated 3 months ago
- 🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)☆64Updated last year
- ☆22Updated 2 years ago
- ☆102Updated 2 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 9 months ago
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Updated 2 years ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆14Updated 8 months ago
- Official implementation for "Parameter-Efficient Fine-Tuning Design Spaces"☆26Updated 2 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- [ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval☆30Updated last year
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 4 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Updated last year
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆50Updated 9 months ago
- A curated list of vision-and-language pre-training (VLP). :-)☆58Updated 2 years ago
- ☆17Updated 8 months ago
- Vision-Language Pretraining & Efficient Transformer Papers.☆14Updated 3 years ago
- ☆33Updated last year
- Data for evaluating GPT-4V☆11Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆34Updated 11 months ago
- 自己阅读的多模态对话系统论文(及部分笔记)汇总☆21Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆55Updated 9 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 8 months ago
- This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…☆27Updated 3 weeks ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆24Updated 3 months ago
- Code for our ACL-2023 paper: "Combo of Thinking and Observing for Outside-Knowledge VQA"☆12Updated last year
- Original VinVL (and Oscar) repo with API designed for an easy inference☆8Updated last year
- Code and model for AAAI 2024: UMIE: Unified Multimodal Information Extraction with Instruction Tuning☆34Updated 9 months ago