GuyARoss / CLIP-video-searchLinks
demo natural language video db using CLIP
☆25Updated 9 months ago
Alternatives and similar repositories for CLIP-video-search
Users that are interested in CLIP-video-search are comparing it to the libraries listed below
Sorting:
- Chinese CLIP models with SOTA performance.☆55Updated last year
- A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, imag…☆36Updated 5 months ago
- ☆68Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆97Updated 2 years ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- This code implements a versatile image search engine leveraging the CLIP model and FAISS, capable of processing both text-to-image and i…☆45Updated last year
- Codebase for the Recognize Anything Model (RAM)☆79Updated last year
- ☆22Updated 3 years ago
- TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]☆54Updated 2 years ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark☆37Updated last year
- Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding☆61Updated last month
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆67Updated last year
- ☆29Updated 3 years ago
- ☆115Updated last year
- 模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!☆23Updated 10 months ago
- ☆56Updated last year
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Updated 5 months ago
- [CVPR 2023 Workshop] The code reproduce the results of our solutions on both tracks for Meta AI Video Similarity Challenge (CVPR 2023 Wor…☆51Updated 2 years ago
- Image Search Application with OpenAI CLIP Model and Faiss Library☆25Updated last year
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆46Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆147Updated 2 weeks ago
- Facebook Image Similarity Challenge 2021☆19Updated 3 years ago
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆38Updated last year
- ☆28Updated 3 years ago
- 本项目是关于Yi的多模态系列模型,如Yi-VL-6B/34B等的实验与应用。☆13Updated last year
- CVPR2023 paper☆51Updated last year
- ☆18Updated last year