yukw777/VideoBLIP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yukw777/VideoBLIP)

yukw777 / VideoBLIP

Supercharged BLIP-2 that can handle videos

☆124

Alternatives and similar repositories for VideoBLIP

Users that are interested in VideoBLIP are comparing it to the libraries listed below

Sorting:

yukw777 / EILEV
View on GitHub
EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties
☆132Nov 10, 2024Updated last year
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 2 years ago
anonymous0769 / DreamVideo
View on GitHub
☆17Jul 30, 2024Updated last year
shuheikurita / RefEgo
View on GitHub
☆13Jul 20, 2024Updated last year
alimama-creative / Noise-Rectification
View on GitHub
Official Repo for Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation
☆30Mar 29, 2024Updated last year
ExponentialML / Video-BLIP2-Preprocessor
View on GitHub
A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it
☆144Jan 22, 2024Updated 2 years ago
AlonMendelson / SGVL
View on GitHub
☆17Dec 13, 2023Updated 2 years ago
ilkerkesen / ViLMA
View on GitHub
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
☆16Jan 18, 2024Updated 2 years ago
vgbench / VGBench
View on GitHub
☆19Sep 19, 2024Updated last year
ChenHsing / VIDiff
View on GitHub
☆39Dec 4, 2023Updated 2 years ago
junyangwang0410 / HaELM
View on GitHub
An automatic MLLM hallucination detection framework
☆19Sep 26, 2023Updated 2 years ago
facebookresearch / PartDistillation
View on GitHub
Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"
☆60Dec 17, 2023Updated 2 years ago
X-PLUG / mPLUG-2
View on GitHub
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
☆228Jul 21, 2023Updated 2 years ago
aszala / VPEval
View on GitHub
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Nov 29, 2023Updated 2 years ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆43Mar 11, 2025Updated 11 months ago
daooshee / HD-VG-130M
View on GitHub
The HD-VG-130M Dataset
☆124Apr 8, 2024Updated last year
594zyc / HiTUT
View on GitHub
Official code for the ACL 2021 Findings paper "Yichi Zhang and Joyce Chai. Hierarchical Task Learning from Language Instructions with Uni…
☆24Jun 28, 2021Updated 4 years ago
zhaoyue-zephyrus / AVION
View on GitHub
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆138Aug 23, 2025Updated 6 months ago
joshhales1 / Minecraft-Crafting-Web
View on GitHub
A project designed to build and render a full Minecraft crafting tree.
☆10Aug 10, 2021Updated 4 years ago
jxzhangjhu / awesome-LMM-Hallucination
View on GitHub
List of papers on Hallucination in LMM
☆10Nov 29, 2023Updated 2 years ago
Chenglin-Yang / LESA_classification
View on GitHub
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms
☆11Nov 29, 2021Updated 4 years ago
allenai / PlaSma
View on GitHub
This is a repository for paper titled, PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Plann…
☆14Nov 3, 2023Updated 2 years ago
alibaba / wan-toy-transform
View on GitHub
This is a LoRA model finetuned on Wan-I2V-14B-480P. It turns things in the image into fluffy toys.
☆19Nov 10, 2025Updated 3 months ago
rtous / lester
View on GitHub
☆24Feb 17, 2026Updated 2 weeks ago
chenpipi0807 / LTX-Video-Trainer-GUI
View on GitHub
LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具，支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面，使用户能够轻松设置和启动训练流程，无需编写复杂代码。
☆13Jul 18, 2025Updated 7 months ago
e-bug / fine-grained-evals
View on GitHub
[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
☆13Jun 11, 2023Updated 2 years ago
DAMO-NLP-SG / Video-LLaMA
View on GitHub
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
☆3,128Jun 4, 2024Updated last year
Y-ichen / FlexiFilm
View on GitHub
FlexiFilm: Long Video Generation with Flexible Conditions
☆31May 1, 2024Updated last year
mbzuai-oryx / Video-ChatGPT
View on GitHub
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the cap…
☆1,492Aug 5, 2025Updated 7 months ago
m-bain / webvid
View on GitHub
Large-scale text-video dataset. 10 million captioned short videos.
☆677Aug 14, 2024Updated last year
multimodal-art-projection / IV-Bench
View on GitHub
☆13Apr 23, 2025Updated 10 months ago
princeton-pli / VLM_S2H
View on GitHub
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
☆16Jun 3, 2025Updated 9 months ago
simonw / webvid-datasette
View on GitHub
A Datasette instance for searching WebVid-10M
☆15Sep 30, 2022Updated 3 years ago
stas00 / python-tools
View on GitHub
Python tools
☆14Oct 22, 2023Updated 2 years ago
Naozumi520 / g2pW-Cantonese
View on GitHub
Cantonese Grapheme-to-Phoneme Converter based on GitYCC/g2pW
☆15Dec 10, 2024Updated last year
rui-qian / UGround
View on GitHub
UGround: Towards Unified Visual Grounding with Unrolled Transformers
☆21Feb 15, 2026Updated 2 weeks ago
jalayrac / object-states-action
View on GitHub
Code for the paper Joint Discovery of Object States and Manipulation Actions, ICCV 2017
☆14Aug 7, 2018Updated 7 years ago
JCruan519 / GIST
View on GitHub
(ACM MM24) This is the offical repository of GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction.
☆11Jan 28, 2024Updated 2 years ago