google-deepmind / geckonum_benchmark_t2iLinks

GeckoNum Benchmark for T2I Model Eval.

☆14

Alternatives and similar repositories for geckonum_benchmark_t2i

Users that are interested in geckonum_benchmark_t2i are comparing it to the libraries listed below

Sorting:

TencentARC / GRPO-CARE
☆67Updated last month
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆80Updated 5 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆133Updated last year
sjz5202 / LLaVA-Reward
Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
☆19Updated last week
TencentARC / SEED-Bench-R1
☆87Updated last month
showlab / UniRL
The code repository of UniRL
☆36Updated 2 months ago
SilentView / LVD-2M
[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
☆65Updated 9 months ago
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆67Updated last year
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Updated last year
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆76Updated 8 months ago
XMUDeepLIT / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆30Updated 9 months ago
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆112Updated 6 months ago
ndrwmlnk / Awesome-Video-Diffusion-Models
☆48Updated 6 months ago
facebookresearch / metamorph
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
☆200Updated 3 months ago
McGill-NLP / diffusion-itm
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
☆32Updated last year
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
md-mohaiminul / BIMBA
☆20Updated 2 weeks ago
DCDmllm / MorphTokens
☆44Updated last year
xichenpan / Kosmos-G
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
☆73Updated last year
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Updated 3 weeks ago
huiwon-jang / CoordTok
☆37Updated 6 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆121Updated 2 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆66Updated last week
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆72Updated 10 months ago
Lizw14 / Super-CLEVR
Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"
☆44Updated last year
sled-group / moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
☆29Updated 8 months ago
si0wang / VisVM
☆45Updated 7 months ago
Share14 / ShareGemini
☆31Updated last year
ypwang61 / StoryEval
[CVPR2025] A benchmark for evaluating video generative models in generating short stories
☆15Updated 3 months ago
TIGER-AI-Lab / VIEScore
Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…
☆52Updated 8 months ago