yuhui-zh15/AutoConverter

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuhui-zh15/AutoConverter)

yuhui-zh15 / AutoConverter

Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)

☆40

Alternatives and similar repositories for AutoConverter

Users that are interested in AutoConverter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

orrzohar / Video-STaR
View on GitHub
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆72Jul 10, 2024Updated 2 years ago
jmhb0 / viddiff
View on GitHub
[ICLR 2025] Video Action Differencing
☆53Jul 3, 2025Updated last year
yuhui-zh15 / C3
View on GitHub
Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)
☆36Oct 16, 2024Updated last year
jmhb0 / PaperSearchQA
View on GitHub
[EACL 2026] PaperSearchQA. Data generation pipeline for QA over scientific papers, suitable for RL training search agents
☆34Feb 4, 2026Updated 5 months ago
transductive-visualprogram / tvp
View on GitHub
☆15Jan 7, 2026Updated 6 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
yuhui-zh15 / VLMClassifier
View on GitHub
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆98Oct 19, 2024Updated last year
deep-spin / Infinite-Video
View on GitHub
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
☆21Feb 14, 2025Updated last year
zeyofu / ReFocus_Code
View on GitHub
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆50Jul 22, 2025Updated last year
tang-bd / v-grpo
View on GitHub
[CVPR 2026 Findings] V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
☆56Apr 28, 2026Updated 2 months ago
WxxShirley / MoLoRAG
View on GitHub
[EMNLP 2025] Official implementation for paper "MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval"
☆27Mar 17, 2026Updated 4 months ago
TomSheng21 / R-TPT
View on GitHub
CVPR 2025 - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
☆22Aug 28, 2025Updated 10 months ago
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
jmhb0 / microvqa
View on GitHub
[CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…
☆36Nov 25, 2025Updated 8 months ago
UCSB-AI / MMWorld
View on GitHub
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Jul 15, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
anitarau / SurgBenchKit
View on GitHub
Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"
☆21Jun 2, 2025Updated last year
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
KaihuaTang / Qwen-Tokenizer-Pruner
View on GitHub
Due to the huge vocaburary size (151,936) of Qwen models, the Embedding and LM Head weights are excessively heavy. Therefore, this projec…
☆40Jan 6, 2026Updated 6 months ago
alipay / POA
View on GitHub
☆22Aug 8, 2024Updated last year
MiliLab / REX-RAG
View on GitHub
Official repo for "REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation"
☆35Sep 28, 2025Updated 9 months ago
EPFL-IMOS / TrustVLM
View on GitHub
To Trust Or Not To Trust Your Vision-Language Model's Prediction
☆15May 30, 2025Updated last year
FSoft-AI4Code / VisualCoder
View on GitHub
[NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
☆10Feb 9, 2025Updated last year
Moenupa / VTCBench
View on GitHub
Code and data for VTCBench, a VLM benchmark for long-context understanding capabilities under vision-text compression paradigm.
☆27Mar 16, 2026Updated 4 months ago
yeung-lab / Micro-Bench
View on GitHub
A Vision-Language Benchmark for Microscopy Understanding
☆31Mar 13, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HashmatShadab / Robust-LLaVA
View on GitHub
[ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
☆29Oct 20, 2025Updated 9 months ago
VITA-Group / Nabla-Reasoner
View on GitHub
[ICLR'26] "Nabla-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space" by Peihao Wang*, Ruisi Cai*, Zhen Wang, Hongyuan…
☆35Mar 10, 2026Updated 4 months ago
MiliLab / GeoZero
View on GitHub
Official repo for "GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes"
☆27Feb 11, 2026Updated 5 months ago
lisadunlap / VibeCheck
View on GitHub
Automated Qualitative Analysis of LLMs (ICLR 2025)
☆53Jul 6, 2025Updated last year
RayRuiboChen / Self-Filter
View on GitHub
☆28Jul 10, 2025Updated last year
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
mahtabbigverdi / Aurora
View on GitHub
☆12Dec 4, 2024Updated last year
jaehyun513 / P2T
View on GitHub
Official implementation of Tabular Transfer Learning via Prompting LLMs (COLM 2024).
☆13Aug 6, 2024Updated last year
Lliar-liar / Daily-Omni
View on GitHub
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆42Apr 28, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
UW-Madison-Lee-Lab / CoBSAT
View on GitHub
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆48Jun 2, 2025Updated last year
paulgavrikov / vlm_shapebias
View on GitHub
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
☆31Jan 26, 2025Updated last year
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
nttmdlab-nlp / VDocRAG
View on GitHub
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
☆66May 26, 2025Updated last year
AlexDel / levheimcube
View on GitHub
☆11Feb 16, 2023Updated 3 years ago
wj-inf / MagicMirror
View on GitHub
Official impl. of "MagicMirror: A Large-Scale Dataset and Benchmark for Fine-Grained Artifacts Assessment in Text-to-Image Generation"
☆24Sep 15, 2025Updated 10 months ago
MadryLab / pretraining-distribution-shift-robustness
View on GitHub
☆14Mar 4, 2024Updated 2 years ago