westlake-baichuan-mllm / bc-omniLinks
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM π
β267Updated 5 months ago
Alternatives and similar repositories for bc-omni
Users that are interested in bc-omni are comparing it to the libraries listed below
Sorting:
- β164Updated 5 months ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthinessβ385Updated last month
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learningβ287Updated last month
- MMR1: Advancing the Frontiers of Multimodal Reasoningβ162Updated 3 months ago
- β¨β¨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLMβ328Updated last month
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)β55Updated 3 months ago
- Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resourcesβ231Updated last month
- β232Updated 4 months ago
- β451Updated last week
- β173Updated 5 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Textβ374Updated 2 months ago
- β193Updated 2 months ago
- β¨β¨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learningβ153Updated 2 months ago
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languagesβ312Updated last year
- Long Context Transfer from Language to Visionβ384Updated 3 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ206Updated 6 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Reaβ¦β88Updated 2 weeks ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tooβ¦β241Updated last week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedbackβ283Updated 10 months ago
- Explore the Multimodal βAha Momentβ on 2B Modelβ596Updated 3 months ago
- An automated pipeline for evaluating LLMs for role-playing.β189Updated 9 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ266Updated last year
- [ICCV 2025] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"β164Updated 3 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,β¦β308Updated 4 months ago
- π€ R1-AQA Model: mispeech/r1-aqaβ274Updated 3 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Schemeβ133Updated 3 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMsβ178Updated 3 weeks ago
- β366Updated 5 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]β291Updated this week
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.β244Updated 4 months ago