linzhiqiu / CLIP-FlanT5View external linksLinks
Training code for CLIP-FlanT5
☆30Jul 29, 2024Updated last year
Alternatives and similar repositories for CLIP-FlanT5
Users that are interested in CLIP-FlanT5 are comparing it to the libraries listed below
Sorting:
- Evaluating text-to-image/video/3D models with VQAScore☆374Sep 22, 2025Updated 4 months ago
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆21Mar 26, 2025Updated 10 months ago
- ☆10Jul 5, 2024Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆14Sep 30, 2023Updated 2 years ago
- ☆11Oct 2, 2024Updated last year
- The SVO-Probes Dataset for Verb Understanding☆31Jan 28, 2022Updated 4 years ago
- ☆50Oct 29, 2023Updated 2 years ago
- Official code for the CVPR 2024 Paper "Can Biases in ImageNet Models Explain Generalization?".☆13Jun 24, 2024Updated last year
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31May 29, 2023Updated 2 years ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆89Feb 13, 2024Updated 2 years ago
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- GeckoNum Benchmark for T2I Model Eval.☆15Dec 5, 2024Updated last year
- ☆37Oct 7, 2023Updated 2 years ago
- ☆16Jun 14, 2024Updated last year
- [ECCV 2024] BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion☆21Jul 2, 2024Updated last year
- Extension to `F.grid_sample` that allows using batch index per grid point.☆19Jun 27, 2023Updated 2 years ago
- An operation trying to do the opposite of F.grid_sample☆20Aug 8, 2023Updated 2 years ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- ☆17Oct 1, 2024Updated last year
- ☆16Apr 7, 2024Updated last year
- TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering☆181Apr 29, 2024Updated last year
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆42Dec 16, 2025Updated 2 months ago
- Fashion-VDM: Video Diffusion Model for Virtual Try-On☆19Nov 4, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning, …☆19Dec 16, 2024Updated last year
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆46Oct 15, 2023Updated 2 years ago
- Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"☆52Dec 5, 2024Updated last year
- [ACM MM 2025] MLLMs for Aesthetics Reasoning☆23Jan 5, 2026Updated last month
- ☆21Aug 27, 2025Updated 5 months ago
- ☆25Jun 22, 2023Updated 2 years ago
- ☆57Aug 16, 2025Updated 6 months ago
- A conda-smithy repository for python-spams.☆23Nov 6, 2024Updated last year
- ☆23Jul 8, 2023Updated 2 years ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Dec 10, 2024Updated last year
- [WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling☆56May 10, 2025Updated 9 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance☆23Jan 20, 2025Updated last year