CVPR 2025 Accepted Papers
☆24Dec 20, 2025Updated 2 months ago
Alternatives and similar repositories for Mask2DiT
Users that are interested in Mask2DiT are comparing it to the libraries listed below
Sorting:
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆16May 8, 2025Updated 10 months ago
- ☆18Mar 21, 2025Updated 11 months ago
- Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset☆106Feb 25, 2026Updated 3 weeks ago
- ☆28Sep 4, 2025Updated 6 months ago
- ☆21Jun 3, 2023Updated 2 years ago
- [CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion☆21Oct 11, 2025Updated 5 months ago
- ☆19Apr 16, 2025Updated 11 months ago
- ☆101Nov 6, 2025Updated 4 months ago
- 中科大跨模态智能组-每周论文分享☆16Nov 20, 2022Updated 3 years ago
- Video Diffusion Transformers are In-Context Learners☆35Jan 6, 2025Updated last year
- Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark☆28Apr 22, 2025Updated 10 months ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Aug 26, 2025Updated 6 months ago
- Pytorch implementation for Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation.☆18Jan 4, 2022Updated 4 years ago
- OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models☆154Mar 4, 2026Updated 2 weeks ago
- Balanced Classification: A Unified Framework for Long-Tailed Object Detection (TMM 2023)☆102Apr 18, 2025Updated 11 months ago
- Consistent Autoregressive Video Generation with Long Context☆75Feb 6, 2026Updated last month
- 第三届华为云无人车挑战杯复赛Top1方案分享, Traffic sign detection, yolov4, mindspore☆14Aug 26, 2021Updated 4 years ago
- ☆106Jan 6, 2026Updated 2 months ago
- ☆10Feb 16, 2022Updated 4 years ago
- ☆13Jul 10, 2024Updated last year
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection☆56Aug 16, 2025Updated 7 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated 9 months ago
- [WACV 2025] Cross-Task Affinity Learning for Multitask Dense Scene Predictions☆11Jun 12, 2025Updated 9 months ago
- ☆14Feb 16, 2022Updated 4 years ago
- ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting☆91Feb 11, 2023Updated 3 years ago
- PANDA大场景多对象检测跟踪(初赛检测)开源代码,初赛排名13☆13Jul 17, 2021Updated 4 years ago
- Official code for CustAny: Customizing Anything from A Single Example. Accepted by CVPR2025 (Oral)☆48Apr 10, 2025Updated 11 months ago
- Animate Any Character in Any World☆96Mar 10, 2026Updated last week
- “计图”算法挑战赛-狗细分类 4/430☆10Apr 26, 2021Updated 4 years ago
- DreamStyle: A Unified Framework for Video Stylization☆113Jan 7, 2026Updated 2 months ago
- [CVPR 2024] Official implementation of "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations"☆279Jul 5, 2025Updated 8 months ago
- [CVPR 2025] Decision SpikeFormer: Spike-Driven Transformer for Decision Making☆18Aug 8, 2025Updated 7 months ago
- [IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition☆10Aug 10, 2025Updated 7 months ago
- [CVPR'25] Conformal prediction for vision-language models. Enhancing VLMs deployment with reliability gurarantees.☆19Jun 7, 2025Updated 9 months ago
- UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs (WWW'25)☆18Apr 22, 2025Updated 10 months ago
- [MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation☆11Apr 3, 2023Updated 2 years ago
- ☆13Jul 24, 2017Updated 8 years ago
- [CVPR 2025] "DepthCues: Evaluating Monocular Depth Perception in Large Vision Models", Duolikun Danier, Mehmet Aygün, Changjian Li, Hakan…☆21Mar 17, 2025Updated last year
- [ECCV 2024] Official code repository of paper titled "Efficient 3D-Aware Facial Image Editing Via Attribute-Specific Prompt Learning"☆10Aug 2, 2024Updated last year