GuyARoss / CLIP-video-searchLinks
demo natural language video db using CLIP
☆26Updated last year
Alternatives and similar repositories for CLIP-video-search
Users that are interested in CLIP-video-search are comparing it to the libraries listed below
Sorting:
- Chinese CLIP models with SOTA performance.☆56Updated last year
- ☆70Updated 2 years ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆27Updated last year
- Toward Universal Multimodal Embedding☆51Updated last week
- Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts☆42Updated 11 months ago
- Our 2nd-gen LMM☆34Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- Large Multimodal Model☆15Updated last year
- Code for the Video Similarity Challenge.☆81Updated last year
- Code and model for the AI City Challenge (CVPR 2022) Track 3 Action Detection (Naturalistic Driving Action Recognition)☆28Updated 2 years ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 11 months ago
- 国内外数据竞赛资讯整理☆18Updated 3 years ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆50Updated last year
- ☆29Updated 3 years ago
- 本项目是关于Yi的多模态系列模型,如Yi-VL-6B/34B等的实验与应用。☆14Updated last year
- Facebook Image Similarity Challenge 2021☆19Updated 3 years ago
- Research Code for Multimodal-Cognition Team in Ant Group☆162Updated last month
- ☆28Updated 3 years ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆98Updated last year
- TagGPT: Large Language Models are Zero-shot Multimodal Taggers☆63Updated 2 years ago
- Exploration of the multi modal fuyu-8b model of Adept. 🤓 🔍☆28Updated last year
- 2nd place solution to Google Universal Image Embedding Challenge!☆43Updated 2 years ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- ☆57Updated last year
- Codebase for the Recognize Anything Model (RAM)☆82Updated last year
- Effective frame sampling for ML applications.☆20Updated 2 months ago
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆29Updated 2 months ago
- 1st Place Solution in Google Universal Image Embedding☆67Updated 2 years ago
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answer☆15Updated last year
- It is a simple python tool to extract key-frames from a video file using peak estimation from frame difference.☆179Updated last month