bytedance / video-SALMONN-2View on GitHub
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.
168Feb 23, 2026Updated last month

Alternatives and similar repositories for video-SALMONN-2

Users that are interested in video-SALMONN-2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?