JiuTian-VL / JiuTian-LIONLinks

[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

☆153

Alternatives and similar repositories for JiuTian-LION

Users that are interested in JiuTian-LION are comparing it to the libraries listed below

Sorting:

Liuziyu77 / RAR
The official implementation of RAR
☆92Updated last year
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆95Updated 10 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆146Updated last month
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆164Updated last year
thunlp / Muffin
☆66Updated last year
chancharikmitra / CCoT
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Updated last year
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆77Updated last year
palchenli / VL-Instruction-Tuning
☆91Updated 2 years ago
FuxiaoLiu / LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆291Updated last year
NExT-ChatV / NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆255Updated last year
zai-org / CogCoM
☆215Updated last year
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆91Updated this week
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆138Updated 3 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
OpenGVLab / MMT-Bench
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆115Updated last year
swordlidev / Evaluation-Multimodal-LLMs-Survey
A Survey on Benchmarks of Multimodal Large Language Models
☆145Updated 5 months ago
Code-kunkun / LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆172Updated 4 months ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆189Updated 7 months ago
zhourax / VEGA
☆37Updated last year
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆298Updated last year
PhoenixZ810 / MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆158Updated last year
Meituan-AutoML / Lenna
☆86Updated last year
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆151Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆331Updated last year
TobyYang7 / Llava_Qwen2
Visual Instruction Tuning for Qwen2 Base Model
☆40Updated last year