Supercharged BLIP-2 that can handle videos
☆124Dec 1, 2023Updated 2 years ago
Alternatives and similar repositories for VideoBLIP
Users that are interested in VideoBLIP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆132Nov 10, 2024Updated last year
- This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…☆13May 25, 2023Updated 2 years ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆144Jan 22, 2024Updated 2 years ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- ☆17Jul 30, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆39Dec 4, 2023Updated 2 years ago
- The HD-VG-130M Dataset☆125Apr 8, 2024Updated last year
- ☆13Jul 20, 2024Updated last year
- Partially Non-Autoregressive Image Captioning☆10Sep 30, 2021Updated 4 years ago
- Official Repo for Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation☆30Mar 29, 2024Updated last year
- Official Code for DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents (Findings of EMNL…☆22Oct 24, 2023Updated 2 years ago
- ☆17Dec 13, 2023Updated 2 years ago
- team Doggeee's solution to Ego4D LTA challenge@CVPRW23'☆13Nov 4, 2023Updated 2 years ago
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Jan 18, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- List of papers on Hallucination in LMM☆10Nov 29, 2023Updated 2 years ago
- Code for EMNLP 2022 Paper DANLI: Deliberative Agent for Following Natural Language Instructions☆18May 1, 2025Updated 10 months ago
- Official code for the ACL 2021 Findings paper "Yichi Zhang and Joyce Chai. Hierarchical Task Learning from Language Instructions with Uni…☆24Jun 28, 2021Updated 4 years ago
- An automatic MLLM hallucination detection framework☆19Sep 26, 2023Updated 2 years ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- FlexiFilm: Long Video Generation with Flexible Conditions☆31May 1, 2024Updated last year
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆228Jul 21, 2023Updated 2 years ago
- [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding☆3,136Jun 4, 2024Updated last year
- ☆19Sep 19, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆20May 7, 2022Updated 3 years ago
- [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models☆346May 27, 2024Updated last year
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- "From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models" [Uziel, Dinari, and Freifeld, NeurIPS 20…☆13Jan 16, 2024Updated 2 years ago
- Code release for "Learning Video Representations from Large Language Models"☆534Oct 1, 2023Updated 2 years ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆35Nov 13, 2024Updated last year
- The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".☆12Oct 17, 2023Updated 2 years ago
- ☆24Feb 17, 2026Updated last month
- A Datasette instance for searching WebVid-10M☆15Sep 30, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Nov 7, 2023Updated 2 years ago
- LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具,支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面,使用户能够轻松设置和启动训练流程,无需编写复杂代码。☆13Jul 18, 2025Updated 8 months ago
- Official code for "Audio-Guided Attention Network for Weakly Supervised Violence Detection" (ICCECE2022).☆13Mar 25, 2022Updated 4 years ago
- ☆119Feb 19, 2024Updated 2 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- ☆101May 16, 2024Updated last year
- ☆206Jul 12, 2024Updated last year