GuyARoss / CLIP-video-search
demo natural language video db using CLIP
☆19Updated last month
Related projects: ⓘ
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆35Updated last week
- Chinese CLIP models with SOTA performance.☆44Updated last year
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆36Updated 8 months ago
- ☆53Updated 7 months ago
- This code implements a versatile image search engine leveraging the CLIP model and FAISS, capable of processing both text-to-image and i…☆37Updated 8 months ago
- Vision-oriented multimodal AI☆49Updated 3 months ago
- 视频分类标注、视频时空标注☆27Updated last year
- ☆15Updated 8 months ago
- ☆29Updated 3 months ago
- Video shot transition detection☆21Updated last year
- ☆63Updated last year
- TagGPT: Large Language Models are Zero-shot Multimodal Taggers☆59Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆33Updated 11 months ago
- AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection - CVPR NAS 2023☆98Updated last year
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated 7 months ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆130Updated 7 months ago
- ☆54Updated 3 weeks ago
- Feature extraction and feature extractor training for Soccernet videos.☆18Updated 6 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆50Updated 2 months ago
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆87Updated last year
- ☆25Updated 3 years ago
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆38Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆111Updated 2 months ago
- ☆10Updated this week
- ☆32Updated 2 years ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆45Updated 4 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆51Updated 8 months ago
- Modify-Anything is based on yolov5,yolov8 for video and image detection. Segment-anything,lama_cleaner is applied to segment, modify, era…☆9Updated last year