☆109May 15, 2025Updated 9 months ago
Alternatives and similar repositories for Orthus
Users that are interested in Orthus are comparing it to the libraries listed below
Sorting:
- ☆39May 20, 2025Updated 9 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Mar 6, 2025Updated 11 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆187May 21, 2025Updated 9 months ago
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆131May 16, 2025Updated 9 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆103Jul 18, 2025Updated 7 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆418Apr 25, 2025Updated 10 months ago
- ☆15Dec 20, 2024Updated last year
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆512Nov 14, 2025Updated 3 months ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆105Apr 23, 2025Updated 10 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆438Aug 8, 2025Updated 6 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆141Feb 9, 2026Updated 3 weeks ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆304Oct 12, 2025Updated 4 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- The code repository of UniRL☆51May 30, 2025Updated 9 months ago
- The official pytorch implementation of “Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization”.☆19May 22, 2025Updated 9 months ago
- Code for orthogonal neural operator☆18Oct 15, 2023Updated 2 years ago
- d3LLM: Ultra-Fast Diffusion LLM 🚀☆93Feb 4, 2026Updated 3 weeks ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆800Oct 10, 2025Updated 4 months ago
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆42Jun 10, 2025Updated 8 months ago
- Co-Reinforcement Learning for Unified Multimodal Understanding and Generation☆39Jul 22, 2025Updated 7 months ago
- ☆20Oct 10, 2025Updated 4 months ago
- ☆190Dec 17, 2024Updated last year
- ☆291Jul 29, 2025Updated 7 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆237May 30, 2025Updated 9 months ago
- [NeurIPS24] Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos☆21Jan 27, 2026Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Mar 1, 2025Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆52Oct 14, 2024Updated last year
- ☆179Jun 27, 2025Updated 8 months ago
- Compressed version of Tacotron 2 using Tensor Train + Waveglow.☆22Dec 26, 2019Updated 6 years ago
- FQGAN: Factorized Visual Tokenization and Generation☆59Mar 29, 2025Updated 11 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆158Sep 12, 2025Updated 5 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆90Oct 12, 2024Updated last year
- RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space☆39Oct 16, 2025Updated 4 months ago
- Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer☆136Oct 14, 2025Updated 4 months ago
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,875Jan 8, 2026Updated last month
- Visual Generation Tuning☆99Jan 27, 2026Updated last month
- [Arxiv 2025] ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions☆45Jun 11, 2025Updated 8 months ago