a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
☆38Apr 7, 2025Updated last year
Alternatives and similar repositories for OpenOmniNexus
Users that are interested in OpenOmniNexus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- llama-omni训练代码复现☆73Jan 23, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆18Apr 2, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆23Apr 10, 2026Updated 2 months ago
- ☆273May 19, 2025Updated last year
- Expression Snippet Transformer for Robust Video-based Facial Expression Recognition☆17Jan 27, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆16Oct 27, 2024Updated last year
- A Mechanistic View on Video Generation as World Models: State and Dynamics☆44May 18, 2026Updated last month
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Align…☆140May 9, 2026Updated last month
- ☆33Jan 14, 2023Updated 3 years ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆48Mar 2, 2026Updated 3 months ago
- Quick Long Video Understanding [TMLR2025]☆78Oct 27, 2025Updated 7 months ago
- ☆45Apr 2, 2025Updated last year
- Neural Homomorphic Vocoder optimized for singing voice synthesis☆37May 2, 2026Updated last month
- finetune llm part for spark-tts model☆124Mar 25, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆22Oct 10, 2024Updated last year
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆50May 22, 2025Updated last year
- Official Repository of LatentSeek☆83Jun 6, 2025Updated last year
- Drax: Speech Recognition with Discrete Flow Matching☆75Oct 15, 2025Updated 8 months ago
- ☆19Nov 16, 2023Updated 2 years ago
- An official PyTorch implementation of "Certifiably Robust Graph Contrastive Learning" (NeurIPS 2023)☆11Jan 22, 2024Updated 2 years ago
- ☆51Sep 3, 2025Updated 9 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆48Mar 3, 2025Updated last year
- ☆35Oct 23, 2025Updated 7 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Monte Carlo Tree Search Self-Refine (MCTSr)☆22Jul 6, 2024Updated last year
- ☆22Apr 6, 2025Updated last year
- Code for Retrieval-Augmented Perception (ICML 2025)☆71Apr 22, 2026Updated last month
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆109May 20, 2025Updated last year
- Real-time Speech-Text Foundation Model Toolkit (wip)☆257Mar 26, 2025Updated last year
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Jan 12, 2025Updated last year
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,038Jan 15, 2026Updated 5 months ago
- A small rust-based data loader☆37Feb 20, 2026Updated 3 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆41Aug 11, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Full Text Search Over Probabilistic Lattices with Elasticsearch!☆10Nov 20, 2020Updated 5 years ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆111Mar 14, 2025Updated last year
- ☆10Oct 28, 2020Updated 5 years ago
- Vero: An Open RL Recipe for General Visual Reasoning☆123Jun 3, 2026Updated 2 weeks ago
- CVPR 2021 Oral Paper PatchGenCN☆11Oct 28, 2021Updated 4 years ago
- Multimodal Open Source Framework for Conversational Agent Research and Development.☆27Feb 16, 2025Updated last year
- ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations☆21Sep 21, 2025Updated 8 months ago