google-deepmind / geckonum_benchmark_t2i
GeckoNum Benchmark for T2I Model Eval.
☆11Updated 3 months ago
Alternatives and similar repositories for geckonum_benchmark_t2i:
Users that are interested in geckonum_benchmark_t2i are comparing it to the libraries listed below
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- ☆29Updated 8 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆63Updated last month
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆31Updated last year
- Training code for CLIP-FlanT5☆26Updated 8 months ago
- Matryoshka Multimodal Models☆98Updated 2 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆24Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆59Updated 8 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 9 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated 3 months ago
- ☆48Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆116Updated 9 months ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11Updated last year
- ☆45Updated this week
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆69Updated 3 months ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆68Updated 3 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆29Updated 4 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆60Updated last month
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆27Updated 6 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 6 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 4 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆36Updated 2 weeks ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆51Updated 6 months ago
- [CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?☆32Updated last year
- [CVPR2025] A benchmark for evaluating video generative models in generating short stories☆12Updated 3 weeks ago
- ☆40Updated 4 months ago
- (arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives☆36Updated 5 months ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).☆38Updated 10 months ago