jun0wanan / awesome-large-multimodal-agents
☆398Updated 4 months ago
Alternatives and similar repositories for awesome-large-multimodal-agents:
Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆292Updated last week
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.☆585Updated this week
- papers related to LLM-agent that published on top conferences☆311Updated last year
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆307Updated 9 months ago
- The model, data and code for the visual GUI Agent SeeClick☆308Updated 2 months ago
- Efficient Multimodal Large Language Models: A Survey☆312Updated 6 months ago
- [ICML'24 Spotlight] "TravelPlanner: A Benchmark for Real-World Planning with Language Agents"☆309Updated 2 months ago
- This is the repository for the Tool Learning survey.☆301Updated this week
- [ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future☆407Updated 3 weeks ago
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi e…☆391Updated last month
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆493Updated 9 months ago
- A series of technical report on Slow Thinking with LLM☆393Updated this week
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆291Updated 2 months ago
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆578Updated last month
- Building a comprehensive and handy list of papers for GUI agents☆200Updated 3 weeks ago
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆387Updated last month
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆259Updated 10 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆219Updated 7 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆342Updated last year
- Continual Learning of Large Language Models: A Comprehensive Survey☆339Updated 2 weeks ago
- A collection of recent papers on building autonomous agent. Two topics included: RL-based / LLM-based agents.☆639Updated last month
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆347Updated 3 weeks ago
- Open Platform for Embodied Agents☆284Updated last month
- ☆172Updated 9 months ago
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆102Updated 2 months ago
- Official implementation of paper "Cumulative Reasoning With Large Language Models" (https://arxiv.org/abs/2308.04371)☆288Updated 4 months ago
- ✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models☆513Updated last month
- Towards Large Multimodal Models as Visual Foundation Agents☆179Updated last week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆264Updated 5 months ago