☆532Feb 26, 2026Updated this week
Alternatives and similar repositories for DeepEyesV2
Users that are interested in DeepEyesV2 are comparing it to the libraries listed below
Sorting:
- ☆1,137Nov 20, 2025Updated 3 months ago
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆43Dec 19, 2025Updated 2 months ago
- Official implementation of "Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models"☆14Mar 19, 2025Updated 11 months ago
- [MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment☆17Updated this week
- ☆11Jun 21, 2025Updated 8 months ago
- [CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Images☆56Jan 23, 2026Updated last month
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆354Jun 1, 2025Updated 9 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆1,338Feb 3, 2026Updated last month
- [ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"☆189Feb 8, 2026Updated 3 weeks ago
- [ICLR 2026] "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"☆156Feb 7, 2026Updated 3 weeks ago
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆405Jan 29, 2026Updated last month
- Official Implementation of "CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning" on MIC…☆18Feb 12, 2025Updated last year
- A multi-agent LLM system for detecting and resolving cognitive dissonance.☆275Oct 14, 2025Updated 4 months ago
- Test-time Scaling for VAR models☆31Sep 19, 2025Updated 5 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆105Sep 18, 2025Updated 5 months ago
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆280Feb 17, 2026Updated 2 weeks ago
- A MCP Task Server☆11Mar 7, 2025Updated 11 months ago
- VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model☆14Jul 31, 2025Updated 7 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆278Nov 6, 2025Updated 3 months ago
- CVPR 2023: Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification☆105May 28, 2024Updated last year
- Generate videos using Temporal, Google Gemini, and Veo 2.☆16Jul 11, 2025Updated 7 months ago
- Batch Deployment for Document Parsing with AWS Batch & Qwen-2.5-VL☆49Apr 28, 2025Updated 10 months ago
- Ensemble Learning of Foundation Models☆17Aug 29, 2025Updated 6 months ago
- I will be adding different kind of opensource data extraction tools code using python☆10Nov 15, 2024Updated last year
- Advances in recent large vision language models (LVLMs)☆15Sep 23, 2024Updated last year
- a suite of finetuned LLMs for atomically precise function calling 🧪☆17Feb 6, 2026Updated 3 weeks ago
- Retail Search with AI☆14Feb 14, 2026Updated 2 weeks ago
- ☆61Dec 5, 2025Updated 2 months ago
- ☆75Mar 7, 2024Updated last year
- ☆38Jul 14, 2025Updated 7 months ago
- Witness the aha moment of VLM with less than $3.☆4,036May 19, 2025Updated 9 months ago
- Multimodal RewardBench☆62Feb 21, 2025Updated last year
- This is a repository for the course "From Beginner to LLM Developer" by Towards AI.☆12Jan 2, 2025Updated last year
- This is an end-to-end demo showing the power of LLM on top of Azure Data Manager for Energy data☆13Apr 3, 2024Updated last year
- A local search system implementation using Elasticsearch for Wikipedia data indexing and retrieval.☆12May 17, 2025Updated 9 months ago
- ☆13Jun 3, 2022Updated 3 years ago
- Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning, release the dataset and the model weight☆13May 26, 2025Updated 9 months ago
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- ☆12Apr 25, 2022Updated 3 years ago