THUDM / SceneGenAgent
SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆17Updated 4 months ago
Alternatives and similar repositories for SceneGenAgent:
Users that are interested in SceneGenAgent are comparing it to the libraries listed below
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆35Updated 2 months ago
- ☆86Updated 2 weeks ago
- ☆32Updated 3 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆73Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated 2 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.☆54Updated 4 months ago
- ☆33Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆50Updated 4 months ago
- A new novel multi-modality (Vision) RAG architecture☆25Updated 6 months ago
- ☆19Updated 8 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- MLLM @ Game☆12Updated 3 weeks ago
- Code for NeurIPS 2024 paper "AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning"☆40Updated 5 months ago
- ☆62Updated 3 weeks ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆93Updated 6 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆80Updated 6 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Updated 6 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆62Updated last week
- The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".☆34Updated last year
- ☆21Updated 2 months ago
- ☆40Updated 2 weeks ago
- ☆41Updated 5 months ago
- ☆73Updated last year
- ☆61Updated 7 months ago
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"☆53Updated this week
- ☆56Updated 5 months ago
- ☆36Updated last month
- ☆101Updated 2 weeks ago