THUDM / SceneGenAgentLinks
[ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆32Updated last year
Alternatives and similar repositories for SceneGenAgent
Users that are interested in SceneGenAgent are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆52Updated last year
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆198Updated 8 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆146Updated 7 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 6 months ago
- ☆25Updated 3 years ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework☆82Updated 6 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 9 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Updated 9 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆181Updated 2 months ago
- ☆116Updated 2 months ago
- ☆88Updated 5 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆121Updated 2 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆166Updated 3 months ago
- [ICCV 2025] Improving 3D Large Language Model via Robust Instruction Tuning☆65Updated 2 months ago
- A paper list that includes world models or generative video models for embodied agents.☆25Updated 11 months ago
- Official implementation of "Self-Improving Video Generation"☆76Updated 8 months ago
- ☆191Updated 3 weeks ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆56Updated 2 months ago
- ☆38Updated 4 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆91Updated 6 months ago
- ☆62Updated 4 months ago
- ☆113Updated 5 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆136Updated 3 months ago
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆78Updated last year
- Open Platform for Embodied Agents☆336Updated last year
- ☆33Updated 7 months ago
- A high-fidelity, general-purpose platform for embodied agent training and testing.☆154Updated this week
- [CVPR2024] This is the official implement of MP5☆106Updated last year
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆105Updated 5 months ago