zeyofu / Commonsense-T2I
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆15Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for Commonsense-T2I
- Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"☆17Updated 8 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆25Updated last week
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆43Updated 11 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆51Updated this week
- ☆22Updated 6 months ago
- Streaming Video Diffusion: Online Video Editing with Diffusion Models☆16Updated 5 months ago
- SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation (arXiv: 2410.12761)☆19Updated last month
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆55Updated last month
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆29Updated last week
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆50Updated 5 months ago
- ☆36Updated last month
- ☆13Updated 2 weeks ago
- Official Repository of Personalized Visual Instruct Tuning☆24Updated 2 weeks ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- [ECCV 2024 Oral] ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction☆44Updated 3 months ago
- ☆17Updated 5 months ago
- ☆11Updated 4 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆29Updated 2 weeks ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 5 months ago
- Official code for ICLR 2024 paper Do Generated Data Always Help Contrastive Learning?☆28Updated 7 months ago
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆55Updated last month
- ☆12Updated 3 months ago
- [CVPR 2024] InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization☆31Updated 5 months ago
- Official source codes of "TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation"☆25Updated last month
- Video Diffusion State Space Models☆19Updated 7 months ago
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"☆32Updated 4 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆45Updated 5 months ago
- ☆73Updated 8 months ago