xlang-ai / VideoAgentTrekLinks
The official repo of VideoAgentTrek
☆37Updated 2 months ago
Alternatives and similar repositories for VideoAgentTrek
Users that are interested in VideoAgentTrek are comparing it to the libraries listed below
Sorting:
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆199Updated 2 months ago
- Official repo for UAE☆116Updated this week
- ☆37Updated last month
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆32Updated 4 months ago
- This is the offical repository of InfiniteVL☆65Updated 2 weeks ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 8 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 11 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆228Updated last week
- [NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting☆68Updated 6 months ago
- ☆68Updated 3 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆96Updated this week
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆50Updated last week
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆53Updated last week
- ☆63Updated 5 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆38Updated 6 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆143Updated this week
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆79Updated last year
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆113Updated 4 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆56Updated 2 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆125Updated 5 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆51Updated 5 months ago
- Geometric-Mean Policy Optimization☆94Updated last month
- ☆39Updated 7 months ago
- ☆180Updated 2 weeks ago
- ☆62Updated 3 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆72Updated 3 weeks ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆164Updated 2 months ago
- ☆48Updated this week
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆190Updated last week
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆186Updated 2 weeks ago