EPFL-VILAB/fm-vision-evals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EPFL-VILAB/fm-vision-evals)

EPFL-VILAB / fm-vision-evals

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks, ICLR 2026

☆72

Alternatives and similar repositories for fm-vision-evals

Users that are interested in fm-vision-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HashmatShadab / HSAT
View on GitHub
[MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
☆12Jun 17, 2025Updated last year
jylei16 / Imagine-e
View on GitHub
☆14Jan 22, 2025Updated last year
Improbable-AI / orso
View on GitHub
☆18Feb 22, 2025Updated last year
mzeeshankaramat / SafeAgents
View on GitHub
☆20Jun 4, 2026Updated last month
mbzuai-oryx / Agent-X
View on GitHub
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆43Apr 28, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
ilkerkesen / ViLMA
View on GitHub
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
☆16Jan 18, 2024Updated 2 years ago
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
Hasindri / HLSS
View on GitHub
[MICCAI 2024 🔥] HLSS, the first study to explore hierarchical information inherent in histopathology images and their language descripti…
☆27Aug 5, 2024Updated last year
markendo / downscaling_intelligence
View on GitHub
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
☆25Mar 21, 2026Updated 4 months ago
aaronserianni / attention-iou
View on GitHub
[CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Maps
☆13Mar 26, 2025Updated last year
arubique / OCCAM
View on GitHub
This is an implementation of the paper "Are We Done with Object-Centric Learning?"
☆14Jun 21, 2026Updated last month
hananshafi / MedContext
View on GitHub
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"
☆14Nov 1, 2024Updated last year
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
zlab-princeton / 3d-gen-mem
View on GitHub
Code release for "Memorization in 3D Shape Generation: An Empirical Study"
☆21Dec 30, 2025Updated 6 months ago
rhfeiyang / Opt-In-Art
View on GitHub
Official implementation of "Opt-In Art: Learning Art Styles Only from Few Examples" (Accepted by NeurIPS 2025)
☆33Nov 30, 2025Updated 7 months ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
r4dl / nerfinternals
View on GitHub
Original reference implementation of "Analyzing the Internals of Neural Radiance Fields"
☆11Apr 10, 2024Updated 2 years ago
spacetools / SpaceTools
View on GitHub
code release
☆38Jun 22, 2026Updated last month
hananshafi / MTL-ViT
View on GitHub
A new multi-task learning framework using Vision Transformers
☆11Jun 19, 2024Updated 2 years ago
MJ-Jang / BECEL
View on GitHub
☆10Jan 28, 2024Updated 2 years ago
MathGenie / MathGenie
View on GitHub
☆14Mar 11, 2024Updated 2 years ago
google-research-datasets / egotempo
View on GitHub
☆26Jun 19, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ChengHan111 / VPT-or-FT
View on GitHub
Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)
☆13Mar 8, 2024Updated 2 years ago
haoningwu3639 / SpatialScore
View on GitHub
[CVPR 2026 Highlight] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
☆84May 28, 2026Updated 2 months ago
umair1221 / WorldCache
View on GitHub
WorldCache: Content-Aware Caching for Accelerated Video World Models
☆21Jun 28, 2026Updated last month
AlvinWen428 / spatial-relation-benchmark
View on GitHub
☆16Oct 12, 2024Updated last year
wufeim / SpatialReasonerDataGen
View on GitHub
Synthetic VQA data generation code for SpatialReasoner.
☆20Nov 25, 2025Updated 8 months ago
gefend / LIMITR
View on GitHub
Implementation of the paper LIMITR: Leveraging Local Information for Medical Image-Text Representation
☆17Jul 21, 2026Updated last week
NYUMedML / headCT_foundation
View on GitHub
Foundation 3D ViT model for volumetric head CT - Nature Biomedical Engineering
☆62Apr 22, 2026Updated 3 months ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
aimagelab / MAD
View on GitHub
Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Att…
☆15Jul 9, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
luka-group / vlm-knowledge-conflict
View on GitHub
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆54Oct 19, 2024Updated last year
THU-KEG / LongWriter-V
View on GitHub
[ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
☆24Mar 29, 2025Updated last year
KAIST-Visual-AI-Group / BezierFlow
View on GitHub
[ICLR 2026] Official code for BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
☆23Apr 13, 2026Updated 3 months ago
LAION-AI / scaling-laws-for-comparison
View on GitHub
☆22May 12, 2026Updated 2 months ago
techmn / cosnet
View on GitHub
A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)
☆12Aug 11, 2025Updated 11 months ago
amirzandieh / HyperAttention
View on GitHub
Triton Implementation of HyperAttention Algorithm
☆48Dec 11, 2023Updated 2 years ago
HashmatShadab / Robustness-of-Volumetric-Medical-Segmentation-Models
View on GitHub
[BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
☆15Nov 1, 2024Updated last year