CodeGoat24 / UnifiedReward
Official implementation of Unified Reward Model for Multimodal Understanding and Generation.
☆214Updated last week
Alternatives and similar repositories for UnifiedReward:
Users that are interested in UnifiedReward are comparing it to the libraries listed below
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆292Updated 3 weeks ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆66Updated 2 weeks ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆183Updated this week
- Implements VAR+CLIP for text-to-image (T2I) generation☆129Updated 2 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 2 months ago
- High-performance Image Tokenizers for VAR and AR☆226Updated this week
- This is a repo to track the latest autoregressive visual generation papers.☆169Updated this week
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆268Updated 2 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆167Updated this week
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generat…☆143Updated 3 weeks ago
- Video-R1: Towards Super Reasoning Ability in Video Understanding MLLMs☆105Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆53Updated last week
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆419Updated this week
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation☆190Updated last month
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆256Updated 3 weeks ago
- The Next Step Forward in Multimodal LLM Alignment☆135Updated 3 weeks ago
- This is the official implementation for ControlVAR.☆101Updated 3 months ago
- [CVPR 2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆138Updated 3 weeks ago
- A Unified Tokenizer for Visual Generation and Understanding☆210Updated 3 weeks ago
- ☆139Updated 2 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆246Updated 2 months ago
- ☆50Updated this week
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆422Updated last week
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆313Updated 3 weeks ago
- ☆146Updated 3 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆119Updated 2 weeks ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆84Updated 2 months ago
- [TMLR 2025🔥] A survey for the autoregressive models in vision.☆448Updated this week
- “FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with an…☆95Updated 3 months ago
- [ICLR 2025] Reconstructive Visual Instruction Tuning☆73Updated 3 weeks ago