THUDM / SceneGenAgentLinks
[ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆33Updated last year
Alternatives and similar repositories for SceneGenAgent
Users that are interested in SceneGenAgent are comparing it to the libraries listed below
Sorting:
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆153Updated 8 months ago
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆53Updated last year
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- ☆122Updated 3 months ago
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆64Updated 5 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆203Updated 9 months ago
- [TMLR 2025] Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, a…☆56Updated 3 weeks ago
- ☆210Updated last month
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Updated 10 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆48Updated 7 months ago
- [ICCV 2025] Improving 3D Large Language Model via Robust Instruction Tuning☆68Updated 3 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆138Updated 3 months ago
- (ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life☆367Updated last year
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆126Updated 3 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 3 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆133Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 7 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 2 weeks ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 9 months ago
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆109Updated 6 months ago
- ☆41Updated 7 months ago
- Open Platform for Embodied Agents☆339Updated last year
- ☆116Updated 6 months ago
- ☆33Updated 8 months ago
- ☆27Updated 3 years ago
- A high-fidelity, general-purpose platform for embodied agent training and testing.☆165Updated 3 weeks ago
- The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆64Updated 6 months ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆77Updated last week
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated last year
- ☆19Updated 7 months ago