bytedance / video-SALMONN-2Links

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.
45Updated last week

Alternatives and similar repositories for video-SALMONN-2

Users that are interested in video-SALMONN-2 are comparing it to the libraries listed below

Sorting: