baichuan-inc / Baichuan-Omni-1.5
☆130Updated last month
Alternatives and similar repositories for Baichuan-Omni-1.5:
Users that are interested in Baichuan-Omni-1.5 are comparing it to the libraries listed below
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆265Updated 2 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆297Updated 2 months ago
- Long Context Transfer from Language to Vision☆368Updated last week
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆200Updated 2 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆145Updated last week
- ☆218Updated last month
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆81Updated last year
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆69Updated this week
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆163Updated last month
- Explore the Limits of Omni-modal Pretraining at Scale☆97Updated 6 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆224Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆148Updated last week
- ☆73Updated last year
- llama-omni训练代码复现☆57Updated 2 months ago
- 🔥🔥First-ever hour scale video understanding models☆259Updated this week
- Explore the Multimodal “Aha Moment” on 2B Model☆524Updated last week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆96Updated last month
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆158Updated 7 months ago
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆47Updated 2 months ago
- ☆172Updated last month
- [CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆326Updated 3 weeks ago
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆80Updated last week
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆75Updated 5 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆145Updated this week
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆40Updated last week
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought☆211Updated 2 months ago
- A Simple Framework of Small-scale Large Multimodal Models for Video Understanding Based on TinyLLaVA_Factory.☆46Updated last week
- ☆60Updated last week
- HumanOmni☆129Updated 2 weeks ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆117Updated 4 months ago