nadsoft-opensource / RAG-with-open-source-multi-modal
☆15Updated 8 months ago
Related projects: ⓘ
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆48Updated 3 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- ☆79Updated this week
- ☆65Updated last year
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆50Updated 2 months ago
- Vision-oriented multimodal AI☆49Updated 3 months ago
- ☆55Updated 3 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆45Updated 4 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆77Updated last week
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆116Updated 2 weeks ago
- ☆78Updated 9 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆35Updated last week
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆33Updated 11 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v …☆123Updated last week
- InstructionGPT-4☆35Updated 8 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆115Updated 2 weeks ago
- Visual Instruction Tuning for Qwen2 Base Model☆14Updated 2 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆30Updated 2 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆54Updated last month
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆68Updated last week
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆25Updated this week
- ☆50Updated 2 months ago
- ☆70Updated 6 months ago
- FuseAI Project☆75Updated 3 weeks ago
- An open-source implementaion for fine-tuning Qwen2-VL-2B and Qwen2-VL-7B.☆33Updated this week
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆132Updated last month
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆138Updated last week
- Florence-2☆32Updated 2 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆134Updated 3 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆102Updated 3 months ago