zhijie-group / OrthusLinks

☆111

Alternatives and similar repositories for Orthus

Users that are interested in Orthus are comparing it to the libraries listed below

Sorting:

vaew / Awesome-spatial-visual-reasoning-MLLMs
Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)
☆72Updated 6 months ago
hustvl / OmniMamba
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
☆145Updated 8 months ago
bytedance / Multi-Reward-Editing
Multi-Reward as Condition for Instruction-Based Image Editing
☆57Updated 9 months ago
OpenGVLab / TimeSuite
[ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
☆63Updated 8 months ago
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 8 months ago
MCG-NJU / StreamForest
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆102Updated last month
hao-ai-lab / JacobiForcing
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding
☆143Updated 2 weeks ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆40Updated 10 months ago
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆73Updated 3 months ago
muzishen / RCDMs
[AAAI 2025] 🎬RCDMs🎬: Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models. RCDMs improve story…
☆134Updated 3 months ago
A113N-W3I / TIIF-Bench
Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".
☆158Updated last month
jingyi0000 / R1-VL
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
☆446Updated 2 weeks ago
aim-uofa / dLLM-MidTruth
☆57Updated 4 months ago
LYL1015 / JarvisEvo
🔥 JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization
☆246Updated this week
DINGYANB / MUSES
（AAAI 2025）MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
☆42Updated 7 months ago
EdinburghNLP / MMLongBench
The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"
☆171Updated last month
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆185Updated last week
fudoki-hku / FUDOKI
The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.
☆66Updated last week
TencentARC / GRPO-CARE
☆80Updated 6 months ago
yu-rp / Dimple
Dimple, the first Discrete Diffusion Multimodal Large Language Model
☆114Updated 5 months ago
PKU-YuanGroup / WISE
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
☆175Updated last month
LeapLabTHU / AdaNAT
[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
☆35Updated last year
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆235Updated 4 months ago
showlab / UniRL
The code repository of UniRL
☆47Updated 7 months ago
PhoenixZ810 / RISEBench
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆131Updated this week
keshik6 / HourVideo
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
☆161Updated 5 months ago
wusize / Harmon
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆184Updated 7 months ago
TencentARC / MindOmni
☆140Updated 2 months ago
GAIR-NLP / thinking-with-generated-images
Doodling our way to AGI ✏️ 🖼️ 🧠
☆118Updated 7 months ago
ML-GSAI / Diffusion-LLM-Papers
A Collection of Papers on Diffusion Language Models
☆149Updated 3 months ago