mutonix/Vript

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mutonix/Vript)

mutonix / Vript

☆161

Alternatives and similar repositories for Vript

Users that are interested in Vript are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mira-space / MiraData
View on GitHub
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆528Sep 2, 2024Updated last year
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆669Jul 28, 2024Updated last year
daooshee / HD-VG-130M
View on GitHub
The HD-VG-130M Dataset
☆126Apr 8, 2024Updated 2 years ago
Share14 / ShareGemini
View on GitHub
☆32Jul 29, 2024Updated last year
alexandrosstergiou / Inter4K
View on GitHub
Official repository for downloading and using Inter4K video interpolation dataset
☆50Dec 10, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
WangWenhao0716 / VidProM
View on GitHub
[NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
☆185Sep 26, 2024Updated last year
snap-research / Panda-70M
View on GitHub
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆700Oct 25, 2024Updated last year
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
PKU-YuanGroup / Open-Sora-Dataset
View on GitHub
☆116Jun 28, 2024Updated 2 years ago
GXYM / VCapsBench
View on GitHub
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
☆20Jun 2, 2025Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
Vchitect / ShotBench
View on GitHub
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
☆102Sep 12, 2025Updated 10 months ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
JUNJIE99 / MLVU
View on GitHub
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆263Apr 13, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Q-Future / Q-Bench
View on GitHub
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and vi…
☆287Aug 12, 2024Updated last year
DAMO-NLP-SG / VideoLLaMA2
View on GitHub
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,304Jan 23, 2025Updated last year
Stevetich / EventHallusion
View on GitHub
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
☆34Aug 5, 2025Updated 11 months ago
OpenGVLab / VideoChat-Flash
View on GitHub
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆527Updated this week
SivanDoveh / TSVLC
View on GitHub
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆47Sep 25, 2023Updated 2 years ago
patrick-tssn / VideoHallucer
View on GitHub
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆43Dec 16, 2025Updated 7 months ago
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆548Aug 14, 2025Updated 11 months ago
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Vision-CAIR / MiniGPT4-video
View on GitHub
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
☆636Dec 10, 2024Updated last year
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,921May 26, 2025Updated last year
showlab / T2VScore
View on GitHub
T2VScore: Towards A Better Metric for Text-to-Video Generation
☆81Apr 10, 2024Updated 2 years ago
ZrrSkywalker / MAVIS
View on GitHub
[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆156Dec 5, 2024Updated last year
PardoAlejo / MovieCuts
View on GitHub
Learning to cut end-to-end pretrained modules
☆38Apr 17, 2025Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
EccHui / Kodak_compression_performance
View on GitHub
Compression performance of BPG, JPEG, JPEG2000 and Webp.
☆12May 15, 2019Updated 7 years ago
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
MME-Benchmarks / Video-MME
View on GitHub
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆787Dec 8, 2025Updated 7 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
md-mohaiminul / VideoRecap
View on GitHub
☆208Jul 12, 2024Updated 2 years ago
hehao13 / CameraCtrl
View on GitHub
☆657May 24, 2024Updated 2 years ago
TencentARC / ST-LLM
View on GitHub
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆153Sep 10, 2024Updated last year
yunlong10 / VidComposition
View on GitHub
[CVPR 2025] VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
☆30May 10, 2025Updated last year
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,432Jan 12, 2026Updated 6 months ago