AskYoutubeAI / AskVideos-VideoCLIP
☆67Updated 4 months ago
Alternatives and similar repositories for AskVideos-VideoCLIP:
Users that are interested in AskVideos-VideoCLIP are comparing it to the libraries listed below
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆119Updated 3 weeks ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆118Updated 3 months ago
- ☆69Updated 5 months ago
- ☆63Updated 3 weeks ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated 3 weeks ago
- ☆160Updated 4 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆82Updated this week
- ☆63Updated 5 months ago
- Easily compute clip embeddings from video frames☆142Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆41Updated 6 months ago
- ☆68Updated 2 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆49Updated 3 weeks ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆41Updated last month
- ☆89Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆224Updated last year
- ☆64Updated last year
- ☆56Updated 9 months ago
- ☆65Updated last year
- ☆175Updated 7 months ago
- ☆48Updated last year
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆76Updated 2 years ago
- ☆72Updated 9 months ago
- Implementation of the premier Text to Video model from OpenAI☆57Updated 3 months ago
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- Incredibly descriptive audiovisual summaries for videos☆40Updated 6 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆44Updated 3 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆47Updated 6 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆48Updated 2 months ago
- Supercharged BLIP-2 that can handle videos☆117Updated last year
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆137Updated 3 months ago