google-deepmind / geckonum_benchmark_t2i
GeckoNum Benchmark for T2I Model Eval.
☆11Updated 2 months ago
Alternatives and similar repositories for geckonum_benchmark_t2i:
Users that are interested in geckonum_benchmark_t2i are comparing it to the libraries listed below
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆24Updated 4 months ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11Updated last year
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆54Updated last year
- Original code base for On Pretraining Data Diversity for Self-Supervised Learning☆13Updated last month
- ☆58Updated last year
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆75Updated last month
- Preference Learning for LLaVA☆37Updated 3 months ago
- ☆23Updated 4 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆50Updated last week
- ☆38Updated 3 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆67Updated 2 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆41Updated last month
- ☆23Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- Official Repository of Personalized Visual Instruct Tuning☆26Updated 3 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 5 months ago
- ☆31Updated last year
- Code release for "Understanding Bias in Large-Scale Visual Datasets"☆18Updated 2 months ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆28Updated 9 months ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆31Updated 11 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 2 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 8 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆31Updated 3 months ago
- (arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives☆32Updated 3 months ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆50Updated 4 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 6 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆64Updated 8 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Updated last year