nadsoft-opensource / RAG-with-open-source-multi-modal
☆17Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for RAG-with-open-source-multi-modal
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆69Updated 2 months ago
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆84Updated 2 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆35Updated last month
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆59Updated 3 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆54Updated 5 months ago
- Vision-oriented multimodal AI☆49Updated 5 months ago
- InstructionGPT-4☆37Updated 10 months ago
- Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications☆24Updated 2 weeks ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆83Updated 11 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆53Updated 3 weeks ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- ☆65Updated last year
- A Survey on Benchmarks of Multimodal Large Language Models☆65Updated last month
- ☆30Updated 6 months ago
- arXiv 23 "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"☆13Updated 9 months ago
- ☆12Updated 4 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 4 months ago
- ☆30Updated last week
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆23Updated last week
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆74Updated last week
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆122Updated 4 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆38Updated last week
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated 7 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆77Updated 5 months ago
- ☆128Updated 5 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- Building a VLM model starts from the basic module.☆10Updated 7 months ago
- ☆63Updated last month