baichuan-inc / Baichuan-Omni-1.5View external linksLinks
☆185Feb 8, 2025Updated last year
Alternatives and similar repositories for Baichuan-Omni-1.5
Users that are interested in Baichuan-Omni-1.5 are comparing it to the libraries listed below
Sorting:
- ☆29Mar 12, 2025Updated 11 months ago
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,487Mar 28, 2025Updated 10 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆21Dec 22, 2025Updated last month
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆217Feb 28, 2025Updated 11 months ago
- [ICCV2025] WikiAutoGen offical page☆24Feb 6, 2026Updated last week
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆272Jan 27, 2025Updated last year
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆385Jun 13, 2025Updated 8 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Jan 26, 2026Updated 3 weeks ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 7 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆365May 27, 2025Updated 8 months ago
- Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。☆275Feb 3, 2026Updated 2 weeks ago
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆125Nov 8, 2025Updated 3 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆17Mar 11, 2025Updated 11 months ago
- ☆19Jun 29, 2025Updated 7 months ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆90Jan 3, 2026Updated last month
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- A project for tri-modal LLM benchmarking and instruction tuning.☆56Mar 27, 2025Updated 10 months ago
- This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.☆15Feb 12, 2024Updated 2 years ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,919Jun 12, 2025Updated 8 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆994Dec 15, 2025Updated 2 months ago
- ☆16Jul 23, 2024Updated last year
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆125Aug 7, 2025Updated 6 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆191Mar 17, 2025Updated 11 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆3,140Dec 5, 2024Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆276May 26, 2025Updated 8 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Sep 26, 2024Updated last year
- ☆21Jul 25, 2025Updated 6 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"☆870Aug 27, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆646Jun 9, 2024Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Aug 4, 2024Updated last year
- The official implementation of Cross-Task Experience Sharing (COPS)☆29Oct 23, 2024Updated last year
- The official implementation of Self-Exploring Language Models (SELM)☆63Jun 4, 2024Updated last year
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 5 months ago
- ☆16Sep 17, 2024Updated last year
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,574Nov 16, 2025Updated 3 months ago
- ☆486May 6, 2025Updated 9 months ago