HKUST-LongGroup / Diff-IILinks
[CVPR 2025] PyTorch implementation of Diff-II
☆20Updated 9 months ago
Alternatives and similar repositories for Diff-II
Users that are interested in Diff-II are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆295Updated 10 months ago
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆33Updated 8 months ago
- [NeurIPS2024]☆31Updated 11 months ago
- [ICLR2025] Official code for Combining Text-based and Drag-based Editing for Precise and Flexible Image Editing.☆20Updated 6 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆117Updated last week
- ☆24Updated 10 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆136Updated 6 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆231Updated 3 months ago
- [NeurIPS 2025 Spotlight] VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.☆132Updated last month
- An unofficial implementation of the paper “DiffEdit: Diffusion-based semantic image editing with mask guidance”☆39Updated 2 years ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆74Updated last week
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆141Updated 11 months ago
- [Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation☆316Updated 2 months ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆60Updated 4 months ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆164Updated last week
- Official code of SmartEdit [CVPR-2024 Highlight]☆359Updated last year
- [ECCV 2024] Official repository of ECCV 2024 paper: Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion M…☆15Updated 6 months ago
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆90Updated 7 months ago
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…☆96Updated 7 months ago
- Unified Multi-modal IAA Baseline and Benchmark☆90Updated last year
- ☆138Updated last year
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆195Updated 4 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆155Updated 8 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆48Updated last month
- Official repository for CoMM Dataset☆48Updated 11 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆108Updated 6 months ago
- ☆30Updated last year
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆55Updated 8 months ago
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness☆24Updated 6 months ago
- [ICCV 2025] Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning.☆47Updated 2 weeks ago