828Tina / textvqa_grounding_task_qwen2.5-vl-ftLinks

☆76

Alternatives and similar repositories for textvqa_grounding_task_qwen2.5-vl-ft

Users that are interested in textvqa_grounding_task_qwen2.5-vl-ft are comparing it to the libraries listed below

Sorting:

PKU-ICST-MIPL / Finedefics_ICLR2025
☆75Updated 7 months ago
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆472Updated 3 weeks ago
PKU-ICST-MIPL / DyFo_CVPR2025
☆95Updated 3 months ago
dvlab-research / VisionReasoner
Vision Manus: Your versatile Visual AI assistant
☆297Updated last month
Fantasyele / LLaVA-KD
[ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
☆107Updated last month
geshang777 / Seg-R1
[NeurIPS2025 Workshop] Official Implementation of "Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning"
☆52Updated 4 months ago
jam-cc / MMAD
The Codes and Data of A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection [ICLR'25]
☆197Updated 3 months ago
PolyU-ChenLab / UniPixel
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆190Updated last month
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆208Updated 7 months ago
tzjtatata / Myriad
Open-sourced codes, IAD vision-language datasets and pre-trained checkpoints for Myriad.
☆92Updated 4 months ago
yuanpinz / awesome-deep-multimodal-reasoning
Collect the awesome works evolved around reasoning models like O1/R1 in visual domain
☆47Updated 4 months ago
nnnth / UFO
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…
☆248Updated 2 weeks ago
dvlab-research / Seg-Zero
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆555Updated 3 months ago
WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆18Updated last year
Christinepan881 / DINO-R1
☆51Updated 4 months ago
Mwxinnn / AA-CLIP
The official implementation of AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
☆192Updated 5 months ago
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆141Updated 9 months ago
congvvc / HyperSeg
[CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
☆176Updated 11 months ago
WeitaiKang / SegVG
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
☆64Updated last year
yayafengzi / LMM-HiMTok
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
☆77Updated 4 months ago
THU-MIG / YOLO-UniOW
YOLO-UniOW: Efficient Universal Open-World Object Detection
☆166Updated 10 months ago
linkangheng / PR1
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆268Updated 4 months ago
linhuixiao / Awesome-Visual-Grounding
[TPAMI 2025] Towards Visual Grounding: A Survey
☆260Updated last week
saccharomycetes / mllms_know
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆293Updated 7 months ago
zhangfaen / finetune-Qwen2.5-VL
☆77Updated 3 months ago
eternaldolphin / LaMI-DETR
[ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"
☆86Updated 7 months ago
AI-Application-and-Integration-Lab / SAM4MLLM
[ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation,
☆43Updated 8 months ago
ZhangXJ199 / TinyLLaVA-Video-R1
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆107Updated 6 months ago
ding523 / Curr_REFT
☆72Updated 6 months ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆186Updated 6 months ago