OpenGVLab / ControlLLM
ControlLLM: Augment Language Models with Tools by Searching on Graphs
☆188Updated 6 months ago
Alternatives and similar repositories for ControlLLM:
Users that are interested in ControlLLM are comparing it to the libraries listed below
- ☆66Updated last year
- ☆159Updated 6 months ago
- ☆120Updated 7 months ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆127Updated last month
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)☆193Updated this week
- Rethinking Step-by-step Visual Reasoning in LLMs☆151Updated this week
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆117Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆70Updated 2 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆210Updated 9 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆117Updated last week
- HPT - Open Multimodal LLMs from HyperGAI☆313Updated 7 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆153Updated 2 months ago
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated last year
- Environments, tools, and benchmarks for general computer agents☆188Updated 2 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 4 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆278Updated 2 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆188Updated last week
- UGround: Universal GUI Visual Grounding for GUI Agents☆138Updated this week
- ☆78Updated last month
- Long Context Transfer from Language to Vision☆356Updated last month
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆378Updated this week
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆50Updated this week
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆173Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 6 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆44Updated last month
- Touchstone: Evaluating Vision-Language Models by Language Models☆80Updated last year
- ☆73Updated 10 months ago
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆126Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆61Updated last month
- ☆131Updated 7 months ago