joez17 / ChatBridgeLinks

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

☆54

Alternatives and similar repositories for ChatBridge

Users that are interested in ChatBridge are comparing it to the libraries listed below

Sorting:

RifleZhang / LLaVA-Hound-DPO
☆155Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆125Updated 8 months ago
FuxiaoLiu / LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆293Updated last year
imagegridworth / IG-VLM
☆140Updated last year
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆99Updated 3 months ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆112Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆181Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆51Updated 9 months ago
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆98Updated last year
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆190Updated last year
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
scofield7419 / Video-of-Thought
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆170Updated 9 months ago
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆165Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
thunlp / Muffin
☆66Updated last year
ChenDelong1999 / polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
☆64Updated 2 years ago
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
X-PLUG / mPLUG
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
☆97Updated 2 years ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
RUCAIBox / POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆233Updated 3 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆300Updated last year
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆105Updated last year
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆169Updated last year
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆104Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆96Updated last year
DCDmllm / Momentor
☆80Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year