THUDM / SceneGenAgentLinks
[ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆31Updated last year
Alternatives and similar repositories for SceneGenAgent
Users that are interested in SceneGenAgent are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆52Updated last year
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆196Updated 7 months ago
- The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆61Updated 5 months ago
- ☆108Updated last month
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆164Updated 2 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆130Updated last week
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 8 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆108Updated last month
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆76Updated last month
- ☆87Updated 5 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Updated 8 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆158Updated last month
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆116Updated 9 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆80Updated 6 months ago
- [3DV 2026] SpatialGen: Layout-guided 3D Indoor Scene Generation☆326Updated last month
- ☆68Updated 6 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆145Updated 6 months ago
- ☆42Updated 6 months ago
- VCode: SVG as Symbolic Visual Representation☆115Updated last month
- ☆126Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆193Updated 2 months ago
- [ICCV 2025] Improving 3D Large Language Model via Robust Instruction Tuning☆64Updated 2 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆115Updated last month
- Official repository for "Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation" (ICLR2025)☆72Updated 8 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 5 months ago
- ☆62Updated 3 months ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework☆82Updated 6 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- Virtual Community: An Open World for Humans, Robots, and Society☆177Updated 3 weeks ago