[CVPR 2026] V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
☆132Jan 17, 2026Updated last month
Alternatives and similar repositories for V-RGBX
Users that are interested in V-RGBX are comparing it to the libraries listed below
Sorting:
- PICABench: How Far Are We from Physically Realistic Image Editing?☆36Nov 5, 2025Updated 4 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated 2 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆58Dec 26, 2025Updated 2 months ago
- ComfyUI custom node implementation of VideoMaMa for video matting with mask conditioning.☆40Feb 9, 2026Updated last month
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Nov 24, 2025Updated 3 months ago
- MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs☆39Feb 19, 2026Updated 2 weeks ago
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis☆33Dec 8, 2025Updated 3 months ago
- [AAAI' 26]SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction☆26Nov 19, 2025Updated 3 months ago
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆35Jan 16, 2026Updated last month
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆21Dec 22, 2025Updated 2 months ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.☆19Oct 14, 2024Updated last year
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆50Dec 25, 2025Updated 2 months ago
- OmniGAIA: Towards Native Omni-Modal AI Agents☆61Feb 28, 2026Updated last week
- ☆36Dec 16, 2025Updated 2 months ago
- [CVPR2026] Code Release of MVInverse: Feedforward Multi-view Inverse Rendering in Seconds☆137Jan 22, 2026Updated last month
- This project is the official implementation of 'DreamOmni3: Scribble-based Editing and Generation''☆38Dec 30, 2025Updated 2 months ago
- paris - world's first decentralized trained open-weight diffusion model☆53Oct 7, 2025Updated 5 months ago
- A node for ComfyUI that adjusts a latent image before the VAE decoding step in order to improve your image quality.☆35Dec 30, 2025Updated 2 months ago
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆37Nov 26, 2025Updated 3 months ago
- EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory☆61Jan 13, 2026Updated last month
- [AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers☆45Jan 1, 2026Updated 2 months ago
- The official implementation of Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight☆83Jan 16, 2026Updated last month
- ☆56Updated this week
- ☆131Dec 24, 2025Updated 2 months ago
- UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios☆119Dec 17, 2025Updated 2 months ago
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]☆126Feb 6, 2026Updated last month
- Official codes for the paper "GARDO: Reinforcing Diffusion Models without Reward Hacking"☆56Feb 2, 2026Updated last month
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Official implementation of CharacterShot: Controllable and Consistent 4D Character Animation☆49Feb 27, 2026Updated last week
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆56Feb 1, 2026Updated last month
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆40Mar 31, 2025Updated 11 months ago
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Oct 7, 2025Updated 5 months ago
- ☆45Nov 9, 2025Updated 4 months ago
- [NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.☆35Feb 25, 2026Updated last week
- Official Repository of paper: "MotionEdit: Benchmarking and Learning Motion-Centric Image Editing"☆60Feb 28, 2026Updated last week
- ☆110Sep 3, 2025Updated 6 months ago