mat-agent / MAT-Agent
☆19Updated 2 weeks ago
Alternatives and similar repositories for MAT-Agent:
Users that are interested in MAT-Agent are comparing it to the libraries listed below
- ☆12Updated 4 months ago
- ☆66Updated 9 months ago
- ☆20Updated last month
- ☆43Updated last year
- Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external …☆13Updated 6 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 7 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆70Updated 4 months ago
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"☆14Updated 5 months ago
- ☆14Updated last year
- ☆143Updated 5 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆45Updated 5 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆44Updated 5 months ago
- A Self-Training Framework for Vision-Language Reasoning☆73Updated 2 months ago
- ☆22Updated this week
- ☆24Updated 4 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆20Updated 10 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆47Updated last year
- Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)☆20Updated 5 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆70Updated 9 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆86Updated last year
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆56Updated 3 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆64Updated last month
- ☆18Updated 8 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- The official code repository for PRMBench.☆68Updated last month
- ☆34Updated last year
- ☆95Updated last year
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆47Updated this week
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆259Updated 5 months ago