tonychenxyz / vit-interpret
Official implementation of "Interpreting and Controlling Vision Foundation Models via Text Explanations"
☆12Updated 3 months ago
Related projects: ⓘ
- [CBMI2024] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".☆17Updated 2 months ago
- REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets☆11Updated 11 months ago
- repo for paper titled: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment (AAAI'24 Oral)☆25Updated 4 months ago
- ☆19Updated last year
- ImaginaryNet: Learning Object Detectors without Real Images and Annotations☆24Updated last year
- ☆17Updated last year
- The code of Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks☆16Updated 5 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆22Updated 7 months ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- ☆55Updated 11 months ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated last year
- [ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models☆17Updated last month
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆9Updated 8 months ago
- MCPL: MULTI-CONCEPT PROMPT LEARNING☆19Updated 3 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆15Updated last year
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆32Updated last year
- ☆15Updated 2 months ago
- ☆15Updated last year
- ☆30Updated 7 months ago
- DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection☆17Updated 11 months ago
- ☆52Updated last year
- Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation☆14Updated 6 months ago
- ☆15Updated 11 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- A benchmark dataset for evaluating LLM's SVG editing capabilities☆13Updated 4 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆35Updated last month
- ☆36Updated 4 months ago
- Code for paper "Unsegment Anything by Simulating Deformation" (CVPR 2024)☆21Updated 3 months ago
- Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"☆19Updated 4 months ago