THUDM / VisionReward
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
☆216Updated 3 weeks ago
Alternatives and similar repositories for VisionReward:
Users that are interested in VisionReward are comparing it to the libraries listed below
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆89Updated 2 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆272Updated last month
- GenEval: An object-focused framework for evaluating text-to-image alignment☆220Updated last month
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning☆159Updated last week
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representations☆140Updated last month
- Improving Video Generation with Human Feedback☆157Updated 2 weeks ago
- 【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"☆99Updated last week
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆146Updated 3 weeks ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆83Updated 9 months ago
- ☆78Updated 2 weeks ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆73Updated 3 weeks ago
- [NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models☆270Updated 4 months ago
- [CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models☆166Updated 6 months ago
- [NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis☆65Updated 2 months ago
- [Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation☆249Updated last week
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆52Updated 8 months ago
- ☆192Updated 2 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆143Updated 6 months ago
- [CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"☆216Updated last year
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆144Updated 5 months ago
- Official code of SmartEdit [CVPR-2024 Highlight]☆320Updated 9 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆468Updated 2 weeks ago
- ☆157Updated 4 months ago
- [NeurIPS 2024] RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models☆116Updated 5 months ago
- Subjects200K dataset☆107Updated 3 months ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆103Updated 3 weeks ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆307Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆86Updated this week
- (CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision☆122Updated 3 months ago
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆107Updated 6 months ago