SJTU-DENG-Lab / OrthusLinks
☆112Updated 8 months ago
Alternatives and similar repositories for Orthus
Users that are interested in Orthus are comparing it to the libraries listed below
Sorting:
- OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models☆145Updated 9 months ago
- Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)☆72Updated 7 months ago
- Multi-Reward as Condition for Instruction-Based Image Editing☆58Updated 10 months ago
- [NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory☆142Updated 3 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆90Updated 10 months ago
- ☆51Updated last week
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆159Updated 2 months ago
- Jacobi Forcing: Fast and Accurate Diffusion-style Decoding☆154Updated last month
- [AAAI 2025] 🎬RCDMs🎬: Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models. RCDMs improve story…☆134Updated 4 months ago
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆454Updated last month
- 🔥 JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization☆329Updated last week
- ☆169Updated 3 weeks ago
- (AAAI 2025)MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration☆42Updated 8 months ago
- [ACL 2025 Main] MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration☆22Updated 8 months ago
- [AAAI 26'] This is the official pytorch implementation for paper: Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acc…☆61Updated 2 months ago
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆176Updated 2 months ago
- Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos☆304Updated 4 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆161Updated 6 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Updated 10 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆75Updated 4 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆35Updated last year
- A Collection of Papers on Diffusion Language Models☆155Updated 4 months ago
- [NeurIPS 2025] Native-resolution diffusion Transformer☆291Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆121Updated 8 months ago
- ☆81Updated 7 months ago
- Official implement of MIA-DPO☆70Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Updated 5 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆91Updated 6 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆186Updated 8 months ago