SunzeY / AlphaCLIPLinks
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
β841Updated last month
Alternatives and similar repositories for AlphaCLIP
Users that are interested in AlphaCLIP are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β850Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β912Updated last month
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β497Updated last year
- [Pattern Recognition 25] CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasksβ440Updated 6 months ago
- [ECCV 2024] Tokenize Anything via Promptingβ592Updated 9 months ago
- Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]β924Updated last year
- Recent LLM-based CV and related works. Welcome to comment/contribute!β872Updated 6 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β495Updated last year
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"β507Updated last month
- [CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadinβ¦β228Updated 11 months ago
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editingβ567Updated 10 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β541Updated 2 months ago
- VisionLLM Seriesβ1,105Updated 6 months ago
- β544Updated 3 years ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ331Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Betterβ289Updated 7 months ago
- [NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Modelsβ319Updated last year
- [CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"β786Updated last year
- β634Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ389Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Modelsβ205Updated 8 months ago
- [ECCV2024] VideoMamba: State Space Model for Efficient Video Understandingβ992Updated last year
- [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"β729Updated last year
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ644Updated 7 months ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".β286Updated last year
- β350Updated last year
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understandingβ392Updated 4 months ago
- [ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentationβ383Updated last year
- Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.β377Updated 2 years ago
- VideoChat-Flash: Hierarchical Compression for Long-Context Video Modelingβ465Updated 3 months ago