patrick-tssn / Awesome-Multimodal-Memory
Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external memory/knowledge augmented MLLM.
☆16Updated 7 months ago
Alternatives and similar repositories for Awesome-Multimodal-Memory:
Users that are interested in Awesome-Multimodal-Memory are comparing it to the libraries listed below
- A Self-Training Framework for Vision-Language Reasoning☆76Updated 3 months ago
- ☆30Updated last week
- ☆22Updated 2 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆33Updated 9 months ago
- An Illusion of Progress? Assessing the Current State of Web Agents☆38Updated this week
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆72Updated 5 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆64Updated last week
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆28Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆54Updated 8 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆51Updated this week
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆47Updated 5 months ago
- ☆44Updated 5 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆62Updated last week
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 10 months ago
- ☆69Updated 4 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆65Updated last month
- ☆59Updated 7 months ago
- ☆81Updated this week
- ☆72Updated 10 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 5 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆44Updated 4 months ago
- ☆41Updated 2 weeks ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆106Updated 3 weeks ago
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆11Updated last month
- ☆14Updated 2 months ago
- ☆24Updated 5 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆61Updated this week
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆22Updated last week
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆60Updated 9 months ago