THUDM / SceneGenAgentLinks
[ACL 2025 Main] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆25Updated 11 months ago
Alternatives and similar repositories for SceneGenAgent
Users that are interested in SceneGenAgent are comparing it to the libraries listed below
Sorting:
- Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external …☆46Updated last year
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆134Updated 5 months ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework☆81Updated 4 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 4 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆192Updated 6 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆72Updated 9 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)☆45Updated 6 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆150Updated 3 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆87Updated 4 months ago
- ☆36Updated 11 months ago
- ☆91Updated last year
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆115Updated 3 months ago
- ☆60Updated 2 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.☆67Updated 10 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆130Updated last week
- Open Platform for Embodied Agents☆332Updated 9 months ago
- [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.☆292Updated last year
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆85Updated 4 months ago
- The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆59Updated 4 months ago
- ☆50Updated 5 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated 11 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆62Updated 7 months ago
- [CVPR2024] This is the official implement of MP5☆105Updated last year
- ☆102Updated 3 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆241Updated 6 months ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆34Updated 4 months ago
- ☆45Updated last year
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆134Updated last year
- ☆23Updated 3 years ago
- A paper list for spatial reasoning☆157Updated this week