AdaCheng / VidEgoThinkLinks
The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"
☆13Updated 4 months ago
Alternatives and similar repositories for VidEgoThink
Users that are interested in VidEgoThink are comparing it to the libraries listed below
Sorting:
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆58Updated 5 months ago
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆186Updated 2 weeks ago
- ☆52Updated last month
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆33Updated this week
- The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".☆126Updated 3 weeks ago
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆172Updated 3 weeks ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated 2 months ago
- R1-like Video-LLM for Temporal Grounding☆110Updated last month
- ☆71Updated 8 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆139Updated 4 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆68Updated 2 weeks ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆71Updated last year
- Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)☆225Updated 5 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆105Updated last month
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆313Updated last month
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆360Updated 7 months ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆356Updated 5 months ago
- ☆50Updated last year
- [CVPR2024] This is the official implement of MP5☆103Updated last year
- Open Platform for Embodied Agents☆326Updated 6 months ago
- Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆236Updated 3 weeks ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆69Updated 2 months ago
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆119Updated 3 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 3 months ago
- ☆72Updated 2 weeks ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆54Updated 2 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆158Updated 3 months ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆239Updated 2 months ago
- Official code for the paper: Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld☆57Updated 10 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆20Updated 5 months ago