patrick-tssn/VSTAR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/patrick-tssn/VSTAR)

patrick-tssn / VSTAR

[ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information

☆16

Alternatives and similar repositories for VSTAR

Users that are interested in VSTAR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OmniMMI / M4
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆18Apr 2, 2025Updated last year
OmniMMI / OmniMMI
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆23Jul 14, 2026Updated last week
hengRUC / VSP
View on GitHub
☆24Sep 24, 2023Updated 2 years ago
Daria8976 / MMAD
View on GitHub
We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enh…
☆17Dec 31, 2024Updated last year
bigai-nlco / CREAM
View on GitHub
[NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
☆22Oct 10, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
alexa / kilm
View on GitHub
☆23Jun 12, 2023Updated 3 years ago
NUSTM / COQE
View on GitHub
☆14Oct 27, 2023Updated 2 years ago
oxai / visogender
View on GitHub
☆13May 10, 2025Updated last year
TencentYoutuResearch / HighlightDetection-CLC
View on GitHub
Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"
☆18Mar 21, 2023Updated 3 years ago
Vincent-ZHQ / Comprehensive-Long-Video-Understanding-Survey
View on GitHub
A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long…
☆23Sep 12, 2025Updated 10 months ago
TencentYoutuResearch / SceneSegmentation-SCRL
View on GitHub
Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"
☆112Feb 14, 2023Updated 3 years ago
MatthieuFP / VGAMT
View on GitHub
☆12Oct 12, 2024Updated last year
bigai-nlco / VideoTGB
View on GitHub
[EMNLP 2024] A Video Chat Agent with Temporal Prior
☆33Mar 2, 2025Updated last year
Hongcheng-Gao / HAVEN
View on GitHub
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆25Oct 22, 2025Updated 9 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
YangLiu9208 / TCGL
View on GitHub
[IEEE T-IP 2022] TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning
☆24Dec 19, 2023Updated 2 years ago
vision-x-nyu / vstat
View on GitHub
Evaluation code for "Benchmarking Visual State Tracking in Multimodal Video Understanding"
☆35Jun 3, 2026Updated last month
BlueZeros / ReflecTool
View on GitHub
Benchmark, Toolbox, and Reflection-based Method for Clinical Agent
☆22Nov 6, 2024Updated last year
scofield7419 / UMMT-VSH
View on GitHub
Code for the ACL 2023 paper Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Sc…
☆12May 19, 2023Updated 3 years ago
bigai-nlco / VideoLLaMB
View on GitHub
[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
☆87Feb 27, 2025Updated last year
penn-nlp / mmid
View on GitHub
Words and their images in 98 languages
☆14Mar 1, 2019Updated 7 years ago
PINTO0309 / tflite2json2tflite
View on GitHub
Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.
☆28Feb 21, 2023Updated 3 years ago
jayusxp / UECA-Prompt
View on GitHub
UECA-Prompt: Universal Prompt for Emotion Cause Analysis（COLING 2022）
☆16Jun 6, 2023Updated 3 years ago
OmniMMI / OpenOmniNexus
View on GitHub
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
☆38Apr 7, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
liutaocode / AwesomeDiarizationDataset
View on GitHub
Both audio-only and audio-visual speaker diarization datasets are listed here.
☆16Feb 22, 2023Updated 3 years ago
Pascalson / LERG
View on GitHub
A unified approach to explain conditional text generation models. Pytorch. The code of paper "Local Explanation of Dialogue Response Gene…
☆16Mar 21, 2022Updated 4 years ago
WinnieHAN / structure_adv
View on GitHub
☆10Oct 28, 2020Updated 5 years ago
qizhou000 / RECIPE
View on GitHub
[EMNLP 2024 poster] Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
☆16Dec 17, 2024Updated last year
amazon-science / iwslt-autodub-task
View on GitHub
☆21Mar 4, 2024Updated 2 years ago
usc-sail / mica-MovieCLIP
View on GitHub
This repository contains the codebase for MovieCLIP: Visual Scene Recognition in Movies
☆43Oct 1, 2023Updated 2 years ago
zilongzheng / PatchGenCN
View on GitHub
CVPR 2021 Oral Paper PatchGenCN
☆11Oct 28, 2021Updated 4 years ago
patrick-0817 / T-MASS-dataleakage
View on GitHub
☆10Nov 27, 2024Updated last year
iwangjian / Color4Dial
View on GitHub
Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue (ACL Findings 2023)
☆21Nov 10, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ncherel / infusion
View on GitHub
Internal diffusion for video inpainting
☆16May 19, 2025Updated last year
ChrisAllenMing / Cross_Category_Video_Highlight
View on GitHub
Implementation of Cross-category Video Highlight Detection via Set-based Learning (ICCV 2021).
☆81Aug 27, 2021Updated 4 years ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆15Feb 3, 2026Updated 5 months ago
zhaoxing2022 / MMN-VSOD
View on GitHub
☆15Jan 9, 2024Updated 2 years ago
iLearn-Lab / CVPR25-LION-FS
View on GitHub
[CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
☆29Dec 2, 2025Updated 7 months ago
FX-STAR / Coding_daily
View on GitHub
display daily code
☆28Sep 25, 2019Updated 6 years ago
yunzhuzhang0918 / flexselect
View on GitHub
The official repository for paper "FlexSelect: Flexible Token Selection for Efficient Long Video Understanding".
☆31Sep 19, 2025Updated 10 months ago