jun0wanan / awesome-large-multimodal-agentsLinks

☆477

Alternatives and similar repositories for awesome-large-multimodal-agents

Users that are interested in awesome-large-multimodal-agents are comparing it to the libraries listed below

Sorting:

Yangyi-Chen / Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
☆747Updated last month
Atomic-man007 / Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…
☆350Updated 8 months ago
MMMU-Benchmark / MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…
☆527Updated 6 months ago
reasoning-survey / Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
☆632Updated 5 months ago
OSU-NLP-Group / GUI-Agents-Paper-List
Building a comprehensive and handy list of papers for GUI agents
☆563Updated last month
aialt / awesome-mobile-agents
✨✨Latest Papers and Datasets on Mobile and PC GUI Agent
☆140Updated last year
THUDM / VisualAgentBench
Towards Large Multimodal Models as Visual Foundation Agents
☆245Updated 7 months ago
LightChen233 / Awesome-Long-Chain-of-Thought-Reasoning
Latest Advances on Long Chain-of-Thought Reasoning
☆564Updated 4 months ago
vyokky / LLM-Brained-GUI-Agents-Survey
GitHub page for "Large Language Model-Brained GUI Agents: A Survey"
☆212Updated 5 months ago
nuster1128 / LLM_Agent_Memory_Survey
☆435Updated 4 months ago
njucckevin / SeeClick
The model, data and code for the visual GUI Agent SeeClick
☆444Updated 4 months ago
qianlima-lab / awesome-lifelong-llm-agent
This repository collects awesome survey, resource, and paper for lifelong learning LLM agents
☆252Updated 6 months ago
modelscope / awesome-deep-reasoning
Collect every awesome work about r1!
☆423Updated 7 months ago
XueyangFeng / LLM-Agent-Paper-Digest
papers related to LLM-agent that published on top conferences
☆320Updated 7 months ago
open-compass / MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆272Updated 6 months ago
HaozheZhao / MIC
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆358Updated last year
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆619Updated 8 months ago
OpenGVLab / LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆318Updated last year
zchuz / CoT-Reasoning-Survey
[ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
☆474Updated 10 months ago
showlab / Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆916Updated 2 months ago
swordlidev / Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
☆376Updated 7 months ago
quchangle1 / LLM-Tool-Survey
This is the repository for the Tool Learning survey.
☆459Updated 3 months ago
cooelf / Auto-GUI
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
☆255Updated last year
tianyi-lab / HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…
☆316Updated last month
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆298Updated last year
OpenGVLab / Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆549Updated last year
LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
☆242Updated 11 months ago
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆357Updated 2 weeks ago
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆572Updated 7 months ago
0russwest0 / Awesome-Agent-RL
☆438Updated last month