sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆33Updated last month
Alternatives and similar repositories for Finetune-Qwen2.5-VL:
Users that are interested in Finetune-Qwen2.5-VL are comparing it to the libraries listed below
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 6 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 6 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆78Updated last week
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆82Updated 2 months ago
- The official repository for the RealSyn dataset☆21Updated last month
- ☆21Updated 7 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆82Updated 5 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 2 months ago
- The official implementation of RAR☆84Updated last year
- Precision Search through Multi-Style Inputs☆65Updated 8 months ago
- A Simple Framework of Small-scale Large Multimodal Models for Video Understanding Based on TinyLLaVA_Factory.☆46Updated 2 weeks ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated 10 months ago
- ☆73Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆118Updated 4 months ago
- LinVT: Empower Your Image-level Large Language Model to Understand Videos☆67Updated 3 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 3 months ago
- ☆114Updated 8 months ago
- The Next Step Forward in Multimodal LLM Alignment☆138Updated 3 weeks ago
- ☆61Updated last year
- ☆91Updated last year
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Updated 11 months ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆22Updated 3 months ago
- Building a VLM model starts from the basic module.☆14Updated 11 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆51Updated 4 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆157Updated 5 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆65Updated 4 months ago
- Official repository of MMDU dataset☆86Updated 6 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Updated last year