microsoft / x-reasoner
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆30Updated this week
Alternatives and similar repositories for x-reasoner:
Users that are interested in x-reasoner are comparing it to the libraries listed below
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models☆25Updated 3 weeks ago
- "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"☆17Updated 2 months ago
- ☆48Updated 2 months ago
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs☆11Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 2 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆22Updated last week
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆25Updated last month
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 11 months ago
- Official Repository of Personalized Visual Instruct Tuning☆28Updated 2 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆52Updated last month
- [CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…☆20Updated last month
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆34Updated 4 months ago
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆19Updated 2 months ago
- ☆36Updated 3 months ago
- ☆45Updated 3 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated this week
- Preference Learning for LLaVA☆44Updated 6 months ago
- ☆14Updated 4 months ago
- ☆40Updated 4 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆32Updated 7 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆20Updated 2 weeks ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 6 months ago
- Dataset of paper: On the Compositional Generalization of Multimodal LLMs for Medical Imaging☆32Updated 4 months ago
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆14Updated last month
- ☆17Updated 4 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆12Updated 5 months ago
- Multimodal RewardBench☆39Updated 2 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆42Updated last week
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 7 months ago