ChenDelong1999 / polite-flamingoLinks

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)

☆63

Alternatives and similar repositories for polite-flamingo

Users that are interested in polite-flamingo are comparing it to the libraries listed below

Sorting:

PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
thunlp / Muffin
☆66Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆163Updated last year
palchenli / VL-Instruction-Tuning
☆91Updated last year
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Updated 2 years ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
RifleZhang / LLaVA-Hound-DPO
☆155Updated 11 months ago
zhjohnchan / SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆32Updated 2 years ago
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆93Updated 2 months ago
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆88Updated 2 years ago
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated last year
zengyan-97 / X2-VLM
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
☆165Updated last year
mlfoundations / VisIT-Bench
☆50Updated last year
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
vlf-silkie / VLFeedback
☆100Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
FuxiaoLiu / LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆287Updated last year
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆130Updated 2 years ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆36Updated last year
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆30Updated 7 months ago
SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆46Updated 2 years ago
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆149Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆274Updated last year
open-vision-language / oven
☆40Updated 2 years ago
AGI-Edgerunners / IIL
Code for our Paper "All in an Aggregated Image for In-Image Learning"
☆29Updated last year
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago
joez17 / ChatBridge
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…
☆53Updated 2 years ago