zhangfaen / finetune-Florence-2-large-ftLinks

☆9

Alternatives and similar repositories for finetune-Florence-2-large-ft

Users that are interested in finetune-Florence-2-large-ft are comparing it to the libraries listed below

Sorting:

xinyanghuang7 / Basic-Visual-Language-Model
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
☆44Updated last year
fjiangAI / MMAPIS
☆19Updated 9 months ago
bombom713 / Try-On-Diffusion
☆10Updated last year
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆101Updated 5 months ago
ChangxinWang / BoFiCap
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
☆9Updated last year
waltonfuture / InstructionGPT-4
InstructionGPT-4
☆39Updated last year
deepglint / UniME
[ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆81Updated 3 weeks ago
HAWLYQ / InfoMetIC
☆14Updated last year
adlnlp / form_nlu
☆14Updated 8 months ago
johncaged / OPT_Questioner
Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
☆15Updated last year
Token-family / TokenFD
[ICCV2025] A Token-level Text Image Foundation Model for Document Understanding
☆109Updated 3 weeks ago
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆246Updated 2 weeks ago
NExT-ChatV / NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆246Updated last year
JarvisUSTC / Awesome-Multimodal-RAG
A curated list of the latest advancements, papers, tools, and datasets for **Multimodal Retrieval-Augmented Generation (RAG)**. Multimoda…
☆23Updated 6 months ago
SJTU-DeepVisionLab / FLoRA
☆40Updated last year
XMUDeepLIT / LLaVE
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆63Updated 2 months ago
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆62Updated 8 months ago
AI9Stars / XLRS-Bench
[CVPR 2025 HIghlight] XLRS-Bench: ould Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
☆43Updated last month
ding523 / Curr_REFT
☆64Updated 2 months ago
WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆17Updated last year
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆93Updated last month
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆46Updated 6 months ago
opendatalab / VIGC
AAAI 2024: Visual Instruction Generation and Correction
☆93Updated last year
DataArcTech / RagVL
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆78Updated 8 months ago
Bhashini-IITJ / visualTranslation
Implementation of Baseline for Scene Text-to-Scene Text Translation
☆16Updated 3 months ago
juzhengz / LoRI
[COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
☆139Updated 2 weeks ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆165Updated 10 months ago
allenai / pixmo-docs
ACL 2025: Synthetic data generation pipelines for text-rich images.
☆90Updated 4 months ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆221Updated 4 months ago
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆159Updated 2 weeks ago