NVlabs / OmniVinciLinks
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
☆549Updated 3 weeks ago
Alternatives and similar repositories for OmniVinci
Users that are interested in OmniVinci are comparing it to the libraries listed below
Sorting:
- ☆78Updated 6 months ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆288Updated 2 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆534Updated 3 weeks ago
- This is the official repo for the paper "LongCat-Flash-Omni Technical Report"☆393Updated last week
- NEO Series: Native Vision-Language Models from First Principles☆223Updated last month
- StreamingVLM: Real-Time Understanding for Infinite Video Streams☆716Updated last month
- An open-source implementation of Whisper☆455Updated 3 weeks ago
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆323Updated last month
- An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"☆142Updated 2 weeks ago
- ☆180Updated 9 months ago
- Kyutai with an "eye"☆223Updated 7 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆116Updated last month
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆656Updated last month
- MiMo-VL☆591Updated 3 months ago
- ☆570Updated last week
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆93Updated 2 weeks ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆304Updated 3 weeks ago
- A Scientific Multimodal Foundation Model☆607Updated last month
- Official implementation of "Continuous Autoregressive Language Models"☆584Updated last week
- The official GitHub Page for MiniMax☆60Updated last week
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆296Updated 5 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆290Updated 6 months ago
- Fully Open Framework for Democratized Multimodal Training☆614Updated last week
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆650Updated this week
- ☆156Updated 2 weeks ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆148Updated 2 months ago
- ☆313Updated last week
- Liquid Audio - Speech-to-Speech audio models by Liquid AI☆250Updated last month
- ☆137Updated 3 months ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆272Updated last week