GuyARoss / CLIP-video-search
demo natural language video db using CLIP
☆22Updated 6 months ago
Alternatives and similar repositories for CLIP-video-search:
Users that are interested in CLIP-video-search are comparing it to the libraries listed below
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆47Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆24Updated last year
- ☆25Updated 5 months ago
- Facebook Image Similarity Challenge 2021☆19Updated 3 years ago
- Code for the Video Similarity Challenge.☆77Updated last year
- Vision-oriented multimodal AI☆49Updated 8 months ago
- This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"☆114Updated 3 weeks ago
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆19Updated 11 months ago
- Chinese CLIP models with SOTA performance.☆53Updated last year
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆53Updated last year
- ☆56Updated last year
- This repo contains extensions to DINO V2 model by Meta, and awesome applications built on top of it.☆39Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 6 months ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆41Updated last month
- EdgeSAM model for use with Autodistill.☆26Updated 8 months ago
- ☆18Updated 10 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 5 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆81Updated 3 weeks ago
- ☆32Updated 8 months ago
- ☆26Updated 9 months ago
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆62Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 4 months ago
- ☆110Updated last year
- ☆67Updated last year
- Code and model for the AI City Challenge (CVPR 2022) Track 3 Action Detection (Naturalistic Driving Action Recognition)☆28Updated last year
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆91Updated 2 years ago
- Video shot transition detection☆21Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆136Updated 7 months ago
- ☆22Updated 3 years ago
- Large Multimodal Model☆14Updated 10 months ago