Callione / LLaVA-MOSS2
Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.
β13Updated 7 months ago
Alternatives and similar repositories for LLaVA-MOSS2:
Users that are interested in LLaVA-MOSS2 are comparing it to the libraries listed below
- A collection of omni-mllmβ25Updated last week
- Official repository of MMDU datasetβ89Updated 6 months ago
- π This is a repository for organizing papers, codes, and other resources related to unified multimodal models.β173Updated this week
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".β98Updated 5 months ago
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) πβ32Updated 3 weeks ago
- MMR1: Advancing the Frontiers of Multimodal Reasoningβ155Updated last month
- [EMNLP 2024 Findingsπ₯] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inβ¦β92Updated 5 months ago
- [ACL'2024 Findings] GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluationβ55Updated last year
- β72Updated 10 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β152Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β162Updated 3 months ago
- Synth-Empathy: Towards High-Quality Synthetic Empathy Dataβ15Updated 2 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.β114Updated 3 weeks ago
- The Next Step Forward in Multimodal LLM Alignmentβ145Updated last month
- [AAAI 2025]Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoningβ31Updated 2 weeks ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?β38Updated 10 months ago
- β47Updated 10 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)β29Updated last month
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM πβ267Updated 3 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMsβ22Updated last week
- The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"β27Updated this week
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiencyβ100Updated last month
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!β35Updated last month
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501β55Updated 9 months ago
- β30Updated 4 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ223Updated last year
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies πβ42Updated last week
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.β56Updated last month
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Schemeβ115Updated 2 weeks ago
- A project for tri-modal LLM benchmarking and instruction tuning.β32Updated last month