PRIS-CV / FairHumanLinks
☆23Updated last week
Alternatives and similar repositories for FairHuman
Users that are interested in FairHuman are comparing it to the libraries listed below
Sorting:
- ☆11Updated 3 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last month
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆31Updated last month
- ☆25Updated 3 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆29Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆37Updated 5 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆45Updated last month
- Code of our paper "A Unified Agentic Framework for Evaluating Conditional Image Generation".☆25Updated 3 months ago
- [CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles☆26Updated 2 months ago
- ☆53Updated 2 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- [CVPR 2025 AI4CC Workshop] Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editin…☆30Updated 2 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 8 months ago
- TPDiff: Temporal Pyramid Video Diffusion Model☆20Updated 4 months ago
- Official implementation of paper "VMoBA: Mixture-of-Block Attention for Video Diffusion Models"☆34Updated 2 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆60Updated this week
- ☆34Updated 3 weeks ago
- ☆19Updated last month
- Offical implementation of "Auto-Regressively Generating Multi-View Consistent Images". (ICCV 2025)☆47Updated 2 weeks ago
- ☆24Updated 3 months ago
- PyTorch Implementation of "LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding"☆19Updated 4 months ago
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆14Updated last week
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆17Updated 3 weeks ago
- Precision Search through Multi-Style Inputs☆71Updated 2 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆81Updated 10 months ago
- Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation☆26Updated last week
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆22Updated this week
- Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"☆32Updated 4 months ago
- ☆25Updated 2 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 3 months ago