xlang-ai / VideoAgentTrekLinks
The official repo of VideoAgentTrek
☆42Updated 3 months ago
Alternatives and similar repositories for VideoAgentTrek
Users that are interested in VideoAgentTrek are comparing it to the libraries listed below
Sorting:
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆33Updated 5 months ago
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models☆77Updated last week
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆52Updated last month
- Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"☆28Updated 7 months ago
- ☆63Updated 7 months ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Updated 2 weeks ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆169Updated last month
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 8 months ago
- ☆33Updated 6 months ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Updated last month
- ☆44Updated 2 months ago
- Official Repository of Native Parallel Reasoner☆100Updated this week
- [ICLR 2026] Geometric-Mean Policy Optimization☆99Updated 2 weeks ago
- ☆68Updated 4 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 9 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆64Updated last week
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Updated 2 weeks ago
- ☆39Updated 8 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆237Updated last week
- Official repo for UAE☆164Updated last month
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆166Updated 2 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆101Updated 3 weeks ago
- ☆63Updated last week
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆57Updated 3 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆119Updated 6 months ago