Ryann-Ran / SconeLinks
Official repository for Scone (Subject-driven Composition and Distinction Enhancement) model, designed to support multi-subject composition and subject distinction tasks in complex contexts.
☆23Updated this week
Alternatives and similar repositories for Scone
Users that are interested in Scone are comparing it to the libraries listed below
Sorting:
- (ICCV 2025)This repository is the official implementation of AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detect…☆147Updated 5 months ago
- ☆17Updated 8 months ago
- Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs☆30Updated 3 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆339Updated 2 months ago
- Official PyTorch Code for Anchor Token Guided Prompt Learning Methods: [ICCV 2025] ATPrompt and [Arxiv 2511.21188] AnchorOPT☆117Updated this week
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆153Updated 9 months ago
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation☆115Updated last week
- [ICCV25 Highlight] The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"☆72Updated 2 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆77Updated 3 months ago
- Easy wrapper for inserting LoRA layers in CLIP.☆40Updated last year
- ☆55Updated last year
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆115Updated 2 months ago
- [NeurIPS 2025 🔥] FakeVLM: Advancing Synthetic Image Detection through Explainable Multimodal Models and Fine-Grained Artifact Analysis☆99Updated 2 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆202Updated 5 months ago
- Instruction Tuning in Continual Learning paradigm☆65Updated 10 months ago
- This is a collection of recent papers on reasoning in video generation models.☆83Updated last week
- Official repo for ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models☆24Updated 8 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆308Updated 8 months ago
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)☆32Updated last month
- NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models☆109Updated 4 months ago
- Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…☆104Updated 4 months ago
- ☆152Updated 10 months ago
- [CVPR 2025] FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs☆41Updated last week
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆62Updated 5 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆127Updated this week
- Official PyTorch implementation for paper "ProAPO: Progressively Automatic Prompt Optimization for Visual Classification". The paper is a…☆27Updated last month
- [ICLR 2025] Official Implementation of Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection☆49Updated 4 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆93Updated 2 weeks ago
- Official implementation of MC-LLaVA.☆139Updated last month
- [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆171Updated 5 months ago