LudovicTuncay / Audio-JEPALinks

Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (ViT) backbone to predict latent representations of masked spectrogram patches.
25Updated last month

Alternatives and similar repositories for Audio-JEPA

Users that are interested in Audio-JEPA are comparing it to the libraries listed below

Sorting: