SunzeY / AlphaCLIPLinks
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
β836Updated last month
Alternatives and similar repositories for AlphaCLIP
Users that are interested in AlphaCLIP are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β846Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β907Updated 2 weeks ago
- [Pattern Recognition 25] CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasksβ439Updated 5 months ago
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β488Updated last year
- [ECCV 2024] Tokenize Anything via Promptingβ591Updated 8 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"β493Updated 3 weeks ago
- Recent LLM-based CV and related works. Welcome to comment/contribute!β870Updated 5 months ago
- Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]β922Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β494Updated last year
- VisionLLM Seriesβ1,098Updated 5 months ago
- Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.β376Updated 2 years ago
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editingβ564Updated 10 months ago
- β637Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Betterβ286Updated 7 months ago
- β543Updated 3 years ago
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadinβ¦β227Updated 10 months ago
- This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation β¦β479Updated 5 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β536Updated last month
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ330Updated last year
- [CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"β781Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ386Updated last year
- [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"β727Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ642Updated 6 months ago
- Official Open Source code for "Scaling Language-Image Pre-training via Masking"β426Updated 2 years ago
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ389Updated 3 months ago
- γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignmentβ822Updated last year
- Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusionβ363Updated 5 months ago
- [NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Modelsβ319Updated last year
- β348Updated last year
- This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.β731Updated last year