westlake-baichuan-mllm / bc-omni
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM π
β265Updated last month
Alternatives and similar repositories for bc-omni:
Users that are interested in bc-omni are comparing it to the libraries listed below
- [CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthinessβ308Updated 2 weeks ago
- β¨β¨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLMβ290Updated 2 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Textβ326Updated 4 months ago
- β218Updated last month
- Long Context Transfer from Language to Visionβ368Updated this week
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ199Updated 2 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Modelβ257Updated 8 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformerβ369Updated 2 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.β221Updated 3 weeks ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedbackβ266Updated 6 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ220Updated 11 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]β158Updated this week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agentβ274Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*β96Updated 3 weeks ago
- Efficient Multimodal Large Language Models: A Surveyβ326Updated 2 weeks ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"β226Updated last month
- β335Updated last month
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paperβ128Updated 8 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Modelsβ60Updated 4 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Modelsβ234Updated 2 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMsβ246Updated 3 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"β146Updated 2 months ago
- β171Updated last month