High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆55Jul 23, 2025Updated 9 months ago
Alternatives and similar repositories for MGPO
Users that are interested in MGPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multi-step reasoning MLLM☆21Mar 8, 2026Updated last month
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning☆55Apr 16, 2026Updated 2 weeks ago
- A framework that allows you to apply Sparse AutoEncoder on any models☆52Jul 11, 2025Updated 9 months ago
- ☆63Feb 27, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 11 months ago
- The code of "Deep Regression Representation Learning with Topology" in ICML 2024☆14Jul 4, 2024Updated last year
- Toolbox for GTA-Human Datasets☆26Oct 9, 2024Updated last year
- VisPlay: Self-Evolving Vision-Language Models☆56Feb 25, 2026Updated 2 months ago
- A framework for camera-controllable image editing using unified geometric guidance and video models.☆34Updated this week
- ☆33Jul 15, 2025Updated 9 months ago
- [ICCV 2025] Boosting MLLM Reasoning with Text-Debiased Hint-GRPO☆47Jul 1, 2025Updated 10 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆117Dec 4, 2025Updated 4 months ago
- ☆78May 4, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆47Jun 24, 2025Updated 10 months ago
- ☆22Aug 1, 2025Updated 9 months ago
- Memory Efficient Training Framework for Large Video Generation Model☆25Apr 22, 2024Updated 2 years ago
- ☆14May 31, 2022Updated 3 years ago
- a ComfyUI plugin that provides a user interface of AudioMass, full-featured web-based audio & waveform editing tool☆30Feb 6, 2026Updated 2 months ago
- [ICLR 2024] Code for FreeNoise based on LaVie☆34Jan 28, 2024Updated 2 years ago
- ☆10Apr 22, 2021Updated 5 years ago
- A large-scale place image dataset with multi-faceted annotations. Multi-level place recognition.☆10Jul 15, 2020Updated 5 years ago
- Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text…☆16Mar 29, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆48Sep 15, 2025Updated 7 months ago
- ☆64Mar 8, 2026Updated last month
- (ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning☆60Sep 30, 2025Updated 7 months ago
- Introduce a novel Video Trimming (VT) task and proposes an agent-based approach (AVT) for detecting wasted footage, selecting valuable se…☆25Jan 20, 2025Updated last year
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆148Aug 21, 2025Updated 8 months ago
- Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)☆14Aug 24, 2023Updated 2 years ago
- [NIPS 2025] Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control☆47Apr 1, 2026Updated last month
- ☆16Sep 25, 2025Updated 7 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆41Jan 5, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official implementation of the paper: [EMNLP 2025] RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruct…☆21Dec 9, 2025Updated 4 months ago
- ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation☆93Jul 14, 2025Updated 9 months ago
- NTU SC2002 Group Project - Final Year Project Management System (FYPMS)☆18Aug 12, 2025Updated 8 months ago
- ☆40Mar 3, 2024Updated 2 years ago
- [ACL2026] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark☆24Apr 13, 2026Updated 2 weeks ago
- Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…☆12Apr 4, 2025Updated last year
- Cut2Next: Generating Next Shot via In-Context Tuning☆32Aug 21, 2025Updated 8 months ago