hbhalpha / MDRLinks
☆19Updated last month
Alternatives and similar repositories for MDR
Users that are interested in MDR are comparing it to the libraries listed below
Sorting:
- ☆22Updated 3 months ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 7 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆341Updated 2 months ago
- ☆58Updated 3 months ago
- Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…☆257Updated last week
- Building a VLM model starts from the basic module.☆16Updated last year
- 一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR☆182Updated last week
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆62Updated 9 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆206Updated last year
- A Survey on Multimodal Retrieval-Augmented Generation☆240Updated this week
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆191Updated last month
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆221Updated 2 months ago
- ☆547Updated last week
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆64Updated last week
- Visual Instruction Tuning for Qwen2 Base Model☆34Updated last year
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆184Updated 3 weeks ago
- ☆85Updated last year
- 对llava官方代码的一些学习笔记☆26Updated 8 months ago
- Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…☆458Updated 3 months ago
- The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆75Updated last month
- New generation of CLIP with fine grained discrimination capability, ICML2025☆203Updated last month
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆132Updated last year
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖☆41Updated last year
- 一个简单的多模态RAG项目☆136Updated last month
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆48Updated last month
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆176Updated 3 months ago
- Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models☆23Updated 6 months ago
- [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆121Updated last month
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆272Updated this week
- Research Code for Multimodal-Cognition Team in Ant Group☆153Updated last month