"Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"
☆55Jan 28, 2026Updated last month
Alternatives and similar repositories for Omni-R1
Users that are interested in Omni-R1 are comparing it to the libraries listed below
Sorting:
- ☆12Mar 11, 2025Updated last year
- TBD☆49Updated this week
- Official implementation of FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment☆34Feb 24, 2026Updated 3 weeks ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆76Dec 3, 2025Updated 3 months ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆47Updated this week
- The demo page for ALMTokenizer☆59Apr 14, 2025Updated 11 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Jan 12, 2025Updated last year
- Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.☆57Mar 12, 2026Updated last week
- The evaluation code for A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5☆53Jan 18, 2026Updated 2 months ago
- Open Ended Medical Reinforcement Learning☆35Updated this week
- [CVPR 2026] Official code of "EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding"☆47Mar 7, 2026Updated last week
- Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedbac…☆44Jan 23, 2026Updated last month
- Whisper Speech Quality Assessment (WhiSQA)☆16Oct 14, 2025Updated 5 months ago
- ☆32Jan 30, 2026Updated last month
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 4 months ago
- daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently☆34Feb 4, 2026Updated last month
- ComfyUI version of WithAnyone☆24Dec 18, 2025Updated 3 months ago
- Spatial Aptitude Training for Multimodal Langauge Models☆24Feb 8, 2026Updated last month
- "A Survey on Agent-as-a-Judge"☆98Jan 12, 2026Updated 2 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 3 months ago
- ☆35Updated this week
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆126Jan 30, 2026Updated last month
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆147Jan 1, 2025Updated last year
- "AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation"☆38Jan 27, 2026Updated last month
- The official repository of paper "Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark"☆20Jun 20, 2025Updated 8 months ago
- ☆29Feb 24, 2026Updated 3 weeks ago
- AnyEnhance-based Baseline for the CCF-AATC 2025 Challenge Track 1☆46Dec 27, 2025Updated 2 months ago
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- SPAgent, a foundation agent for understanding, reasoning over, and operating within the physical and spatial world.☆150Updated this week
- ☆11Oct 31, 2024Updated last year
- ☆21Aug 9, 2024Updated last year
- A neural speech codec based on discrete WavLM representations☆25Aug 28, 2024Updated last year
- Official code of SenSE.☆76Oct 30, 2025Updated 4 months ago
- Code for CVPR 2024 Oral "Neural Lineage"☆17Jun 18, 2024Updated last year
- Green-VLA: Staged Vision-Language-Action Model for Generalist Robots☆109Mar 5, 2026Updated 2 weeks ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆65Feb 21, 2025Updated last year
- ☆60Feb 6, 2026Updated last month
- An unofficial implementation of "UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding".☆26Nov 4, 2023Updated 2 years ago
- Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”☆18Jan 27, 2026Updated last month