828Tina / textvqa_grounding_task_qwen2.5-vl-ftLinks
☆45Updated 2 months ago
Alternatives and similar repositories for textvqa_grounding_task_qwen2.5-vl-ft
Users that are interested in textvqa_grounding_task_qwen2.5-vl-ft are comparing it to the libraries listed below
Sorting:
- New generation of CLIP with fine grained discrimination capability, ICML2025☆259Updated last week
- The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"☆241Updated last week
- ☆78Updated 2 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆91Updated last month
- [TPAMI reviewing] Towards Visual Grounding: A Survey☆207Updated 2 months ago
- ☆67Updated 3 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆480Updated last week
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆190Updated 4 months ago
- 多模态 MM +Chat 合集☆274Updated 2 months ago
- Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"☆213Updated last month
- ☆59Updated 6 months ago
- Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆236Updated 3 weeks ago
- YOLO-UniOW: Efficient Universal Open-World Object Detection☆149Updated 6 months ago
- The Codes and Data of A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection [ICLR'25]☆146Updated 3 weeks ago
- The official implementation of [CVPR 2025] "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".☆346Updated last month
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆540Updated last month
- [arXiv'25] Official Implementation of "Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning"☆29Updated last month
- Collect the awesome works evolved around reasoning models like O1/R1 in visual domain☆35Updated 2 weeks ago
- Fine tuning grounding Dino☆127Updated last week
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆240Updated 3 months ago
- ☆44Updated 6 months ago
- Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"☆137Updated 4 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆161Updated last month
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆222Updated last year
- The Next Step Forward in Multimodal LLM Alignment☆170Updated 3 months ago
- Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future☆191Updated 4 months ago
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆160Updated 7 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆89Updated 2 months ago
- Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆359Updated 4 months ago
- ☆41Updated last month