SJTU-DENG-Lab / OrthusLinks
☆112Updated 8 months ago
Alternatives and similar repositories for Orthus
Users that are interested in Orthus are comparing it to the libraries listed below
Sorting:
- Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)☆72Updated 7 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆82Updated 9 months ago
- OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models☆145Updated 9 months ago
- Multi-Reward as Condition for Instruction-Based Image Editing☆57Updated 10 months ago
- [NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory☆125Updated 2 months ago
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆452Updated last month
- 🔥 JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization☆272Updated this week
- Jacobi Forcing: Fast and Accurate Diffusion-style Decoding☆153Updated 3 weeks ago
- [AAAI 2025] 🎬RCDMs🎬: Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models. RCDMs improve story…☆134Updated 3 months ago
- (AAAI 2025)MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration☆43Updated 8 months ago
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆175Updated 2 months ago
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆160Updated 2 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆161Updated 6 months ago
- Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos☆303Updated 4 months ago
- ☆160Updated last week
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 9 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆35Updated last year
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆139Updated 3 weeks ago
- VideoNSA: Native Sparse Attention Scales Video Understanding☆78Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆179Updated 2 months ago
- [AAAI 26'] This is the official pytorch implementation for paper: Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acc…☆49Updated 2 months ago
- ☆141Updated 3 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆74Updated 4 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆85Updated this week
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆208Updated 2 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆129Updated last year
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆114Updated 2 months ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆47Updated last year
- A Collection of Papers on Diffusion Language Models☆152Updated 4 months ago