McGill-NLP / AURORALinks

Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation

☆29

Alternatives and similar repositories for AURORA

Users that are interested in AURORA are comparing it to the libraries listed below

Sorting:

aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
Hritikbansal / videocon
☆57Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
showlab / EvolveDirector
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
☆49Updated 11 months ago
TIGER-AI-Lab / VIEScore
Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…
☆56Updated 10 months ago
google / storybench
☆49Updated last year
weixi-feng / TC-Bench
☆24Updated last year
zeyofu / Commonsense-T2I
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆22Updated last year
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆77Updated 10 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆141Updated 2 weeks ago
TencentARC / GRPO-CARE
☆75Updated 3 months ago
daeunni / VideoRepair
Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"
☆50Updated 10 months ago
KaiyueSun98 / T2I-ReasonBench
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
☆29Updated 3 weeks ago
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆41Updated 4 months ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated last year
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 3 months ago
measure-infinity / mulan-code
☆40Updated last year
linzhiqiu / CLIP-FlanT5
Training code for CLIP-FlanT5
☆29Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆46Updated 7 months ago
SilentView / LVD-2M
[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
☆70Updated 11 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆60Updated 2 months ago
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆57Updated last year
jialuli-luka / SELMA
Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
☆35Updated last year
weijiawu / ParaDiffusion
[IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model
☆104Updated 6 months ago
PhysGame / PhysGame
PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos
☆46Updated 3 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆150Updated 2 weeks ago
hananshafi / llmblueprint
[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"
☆81Updated last year
showlab / T2VScore
T2VScore: Towards A Better Metric for Text-to-Video Generation
☆79Updated last year