THUDM / SceneGenAgent
SceneGenAgent: Precise Industrial Scene Generation with Coding Agent
☆11Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for SceneGenAgent
- [CVPR2024] This is the official implement of MP5☆84Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆82Updated 4 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- ⛏💎 STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆30Updated 10 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆99Updated 8 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated last month
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 2 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆113Updated last month
- ☆74Updated 8 months ago
- The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization☆94Updated 2 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆56Updated this week
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆102Updated last month
- Explore the Limits of Omni-modal Pretraining at Scale☆89Updated 2 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated this week
- ☆17Updated last year
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆48Updated 3 weeks ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Updated last month
- A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆31Updated this week
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated last month
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆87Updated 2 weeks ago
- Official repo for StableLLAVA☆91Updated 11 months ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆46Updated 2 weeks ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆98Updated 6 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆38Updated 7 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆49Updated 2 months ago
- ☆69Updated 6 months ago
- This repo contains the code and data for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks"☆41Updated last week
- Official implementation of "Self-Improving Video Generation"☆52Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆26Updated 4 months ago