westlake-baichuan-mllm / bc-omniLinks

Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊

☆267

Alternatives and similar repositories for bc-omni

Users that are interested in bc-omni are comparing it to the libraries listed below

Sorting:

baichuan-inc / Baichuan-Omni-1.5
☆164Updated 5 months ago
RLHF-V / RLAIF-V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆392Updated 2 months ago
MiniMax-AI / One-RL-to-See-Them-All
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
☆303Updated 2 months ago
LengSicong / MMR1
MMR1: Advancing the Frontiers of Multimodal Reasoning
☆162Updated 4 months ago
Victorwz / Open-Qwen2VL
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
☆242Updated 2 months ago
VITA-MLLM / Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
☆333Updated 2 months ago
emova-ollm / EMOVA
Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
☆59Updated 4 months ago
WePOINTS / WePOINTS
☆173Updated 5 months ago
RainBowLuoCS / OpenOmni
OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…
☆91Updated last month
infinigence / Infini-Megrez-Omni
☆235Updated 5 months ago
OpenGVLab / OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆382Updated 2 months ago
phellonchen / X-LLM
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
☆312Updated last year
step-law / steplaw
☆196Updated 3 months ago
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆267Updated last year
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆245Updated 5 months ago
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆138Updated 3 months ago
xiaomi-research / r1-aqa
🤗 R1-AQA Model: mispeech/r1-aqa
☆281Updated 4 months ago
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆207Updated 6 months ago
OpenGVLab / MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆230Updated last year
invictus717 / MiCo
[ICCV'25] Explore the Limits of Omni-modal Pretraining at Scale
☆111Updated 11 months ago
EvolvingLMMs-Lab / multimodal-search-r1
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…
☆268Updated last month
RhapsodyAILab / MiniCPM-V-Embedding
☆29Updated 11 months ago
threegold116 / Awesome-Omni-MLLMs
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆44Updated last week
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆385Updated 4 months ago
boson-ai / RPBench-Auto
An automated pipeline for evaluating LLMs for role-playing.
☆192Updated 10 months ago
SkyworkAI / Skywork-Reward-V2
Scaling Preference Data Curation via Human-AI Synergy
☆94Updated 3 weeks ago
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆287Updated 10 months ago
XiaomiMiMo / MiMo-VL
MiMo-VL
☆469Updated last week
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆167Updated 4 months ago
HyperGAI / HPT
HPT - Open Multimodal LLMs from HyperGAI
☆315Updated last year