jmhb0 / microvqaLinks

[CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research" code for MicroVQA benchmark and RefineBot method, fom

☆27

Alternatives and similar repositories for microvqa

Users that are interested in microvqa are comparing it to the libraries listed below

Sorting:

eth-medical-ai-lab / Med-PRM
[EMNLP 2025] Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
☆47Updated 2 months ago
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆38Updated 6 months ago
UCSC-VLAA / m1
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
☆44Updated 7 months ago
Hritikbansal / medmax
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
☆39Updated 2 months ago
LeapLabTHU / CheXWorld
[CVPR 2025] CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
☆28Updated 7 months ago
eric-ai-lab / ProbMed
[ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
☆24Updated 8 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
StanfordMIMI / villa
[ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data
☆46Updated 2 years ago
yuhui-zh15 / C3
Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)
☆34Updated last year
anitarau / SurgBenchKit
Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"
☆16Updated 5 months ago
iancovert / locality-alignment
☆53Updated 10 months ago
claws-lab / projection-in-MLLMs
Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'
☆16Updated last year
minwoosun / biomedica-etl
[CVPR 2025] BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
☆85Updated 7 months ago
yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆39Updated 5 months ago
microsoft / x-reasoner
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆49Updated 6 months ago
UCSC-VLAA / o1_medical
☆48Updated 8 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆54Updated last year
jmhb0 / viddiff
[ICLR 2025] Video Action Differencing
☆48Updated 4 months ago
Tang-xiaoxiao / 3D-RAD
[ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
☆20Updated 3 weeks ago
Han-Zongbo / Skip-n
This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.
☆14Updated last year
aiming-lab / MMedPO
[ICML'25] MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
☆59Updated 5 months ago
nickjiang2378 / vlm-hallucinations
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
☆92Updated 5 months ago
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated last year
BioMedIA-MBZUAI / MedPromptX
☆71Updated 4 months ago
passing2961 / DialogCC
Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…
☆13Updated last year
UARK-AICV / FG-CXR
The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…
☆11Updated 3 months ago
microsoft / chexprompt
Expert-level AI radiology report evaluator
☆34Updated 7 months ago
yczhou001 / MAM
MAM: ModularMulti-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration
☆22Updated 4 months ago
TIGER-AI-Lab / ABC
ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]
☆17Updated 2 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated 3 weeks ago