zhijie-group / OrthusLinks
☆111Updated 7 months ago
Alternatives and similar repositories for Orthus
Users that are interested in Orthus are comparing it to the libraries listed below
Sorting:
- Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)☆72Updated 6 months ago
- OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models☆145Updated 8 months ago
- Multi-Reward as Condition for Instruction-Based Image Editing☆57Updated 9 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆63Updated 8 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 8 months ago
- [NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory☆102Updated last month
- Jacobi Forcing: Fast and Accurate Diffusion-style Decoding☆143Updated 2 weeks ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆40Updated 10 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆73Updated 3 months ago
- [AAAI 2025] 🎬RCDMs🎬: Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models. RCDMs improve story…☆134Updated 3 months ago
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆158Updated last month
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆446Updated 2 weeks ago
- ☆57Updated 4 months ago
- 🔥 JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization☆246Updated this week
- (AAAI 2025)MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration☆42Updated 7 months ago
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆171Updated last month
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆185Updated last week
- The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.☆66Updated last week
- ☆80Updated 6 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆114Updated 5 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆175Updated last month
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆35Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆235Updated 4 months ago
- The code repository of UniRL☆47Updated 7 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆131Updated this week
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆161Updated 5 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆184Updated 7 months ago
- ☆140Updated 2 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆118Updated 7 months ago
- A Collection of Papers on Diffusion Language Models☆149Updated 3 months ago