qunzhongwang / vr-thinkerLinks
☆32Updated last month
Alternatives and similar repositories for vr-thinker
Users that are interested in vr-thinker are comparing it to the libraries listed below
Sorting:
- T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation☆31Updated 2 months ago
- ImaginaryNet: Learning Object Detectors without Real Images and Annotations☆26Updated 2 years ago
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation☆21Updated 3 months ago
- A curated list of papers and resources for text-to-image evaluation.☆30Updated 2 years ago
- [CVPR 2024 Highlight] ImageNet-D☆44Updated last year
- Unifying Specialized Visual Encoders for Video Language Models☆22Updated 4 months ago
- Video Diffusion State Space Models☆19Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 9 months ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆13Updated last year
- ☆39Updated last year
- ReNeg: Learning Negative Embedding with Reward Guidance☆35Updated 10 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆50Updated last year
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated 2 years ago
- A framework that allows you to apply Sparse AutoEncoder on any models☆44Updated 4 months ago
- ☆61Updated 2 years ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆56Updated last year
- Image Tokenizer Needs Post-Training☆24Updated last month
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆25Updated last year
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆79Updated last year
- ☆78Updated 4 months ago
- ☆94Updated 4 months ago
- Curated list of recent visual autoregressive (VAR) modeling works☆31Updated 8 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆36Updated 11 months ago
- ☆21Updated 5 months ago
- ☆19Updated 2 years ago
- ☆12Updated 6 months ago
- ☆13Updated last year
- Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation (ICCV 2023)☆66Updated 2 years ago
- Official implementation of LaVin-DiT☆47Updated 9 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆77Updated 4 months ago