YanqiDai / MMRole
A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
☆40Updated 2 months ago
Alternatives and similar repositories for MMRole:
Users that are interested in MMRole are comparing it to the libraries listed below
- ☆76Updated 8 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆74Updated this week
- ☆38Updated 6 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆78Updated 3 weeks ago
- ☆51Updated last month
- ☆73Updated 10 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 6 months ago
- Open-Pandora: On-the-fly Control Video Generation☆31Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆60Updated 2 months ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆128Updated last month
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆51Updated 2 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆117Updated last week
- Official repository of MMDU dataset☆82Updated 3 months ago
- ☆28Updated 3 months ago
- ☆28Updated last month
- ☆44Updated 3 months ago
- ☆58Updated 7 months ago
- ☆49Updated last week
- A Survey on Benchmarks of Multimodal Large Language Models☆79Updated 2 weeks ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆22Updated 4 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆106Updated 2 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆58Updated 4 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated 9 months ago
- ☆47Updated last year
- An Easy-to-use Hallucination Detection Framework for LLMs.☆55Updated 8 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆126Updated 2 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆67Updated last month
- Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal …☆38Updated last month
- Efficient Mixture of Experts for LLM Paper List☆26Updated last month
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆45Updated 2 months ago