zhangfaen / finetune-Florence-2-large-ft
☆10Updated 6 months ago
Alternatives and similar repositories for finetune-Florence-2-large-ft:
Users that are interested in finetune-Florence-2-large-ft are comparing it to the libraries listed below
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- ☆23Updated 8 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆260Updated 10 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆56Updated 9 months ago
- InstructionGPT-4☆39Updated last year
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆69Updated 5 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆84Updated 10 months ago
- ☆51Updated last week
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆111Updated last month
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆125Updated 10 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆91Updated last month
- Precision Search through Multi-Style Inputs☆68Updated last week
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆184Updated 3 weeks ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆147Updated 10 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆60Updated 2 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- A collection of visual instruction tuning datasets.☆76Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆159Updated 6 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆125Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆155Updated 4 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆139Updated 6 months ago
- ☆83Updated 11 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆146Updated 10 months ago
- ☆56Updated last year
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 7 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆155Updated last month
- Visual Instruction Tuning for Qwen2 Base Model☆32Updated 9 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]☆214Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆114Updated 5 months ago