Callione / LLaVA-MOSS2
Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.
☆13Updated 4 months ago
Alternatives and similar repositories for LLaVA-MOSS2:
Users that are interested in LLaVA-MOSS2 are comparing it to the libraries listed below
- ☆47Updated 2 weeks ago
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆247Updated last month
- ☆60Updated 8 months ago
- Official repository of MMDU dataset☆83Updated 4 months ago
- [ACL'2024 Findings] GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation☆47Updated 11 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆141Updated 3 weeks ago
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆99Updated 4 months ago
- ☆14Updated last year
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆260Updated 2 weeks ago
- StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding☆41Updated this week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆45Updated 7 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆96Updated 5 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆37Updated 8 months ago
- ☆60Updated this week
- [AAAI 2025]Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning☆27Updated 4 months ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆309Updated 5 months ago
- ☆38Updated 7 months ago
- Efficient Multimodal Large Language Models: A Survey☆312Updated 6 months ago
- Visualizing the attention of vision-language models☆104Updated 3 months ago
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".☆93Updated 2 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆346Updated 3 weeks ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆249Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆63Updated 3 weeks ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆95Updated 7 months ago
- Visual Instruction Tuning for Qwen2 Base Model☆23Updated 7 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆96Updated 3 months ago
- ☆33Updated 7 months ago
- Code release for VTW (AAAI 2025) Oral☆32Updated 3 weeks ago