THUDM / SceneGenAgentLinks
[ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆25Updated last year
Alternatives and similar repositories for SceneGenAgent
Users that are interested in SceneGenAgent are comparing it to the libraries listed below
Sorting:
- Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external …☆48Updated last year
- Open Platform for Embodied Agents☆333Updated 10 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆121Updated 4 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆141Updated 6 months ago
- ☆46Updated last year
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆87Updated 5 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 7 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆193Updated 6 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆147Updated last month
- ☆61Updated 2 months ago
- The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆61Updated 4 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆72Updated 10 months ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework☆81Updated 5 months ago
- ☆90Updated last year
- [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.☆293Updated last year
- [CVPR2024] This is the official implement of MP5☆106Updated last year
- (ICLR 2025) The Official Code Repository for GUI-World.☆67Updated 11 months ago
- ☆52Updated 6 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆104Updated last year
- VeriGUI: Verifiable Long-Chain GUI Dataset☆82Updated last month
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 4 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Updated 8 months ago
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆100Updated 4 months ago
- ☆114Updated last month
- ☆100Updated 3 weeks ago
- ☆80Updated last year
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆169Updated last month
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks☆181Updated 2 months ago
- ☆104Updated 4 months ago
- ☆68Updated 2 months ago