roger-tseng/av-superb

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/roger-tseng/av-superb)

roger-tseng / av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

☆58

Alternatives and similar repositories for av-superb

Users that are interested in av-superb are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ga642381 / FlappyBird
View on GitHub
Super Flappy Bird in p5.js
☆10Mar 8, 2021Updated 5 years ago
grtzsohalf / buy_vs_rent_and_invest
View on GitHub
☆15Sep 9, 2021Updated 4 years ago
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
joannahong / AV-RelScore
View on GitHub
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Jun 20, 2023Updated 3 years ago
dynamic-superb / dynamic-superb
View on GitHub
The official repository of Dynamic-SUPERB.
☆200Jun 24, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
ga642381 / SpeechPrompt
View on GitHub
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…
☆102Apr 10, 2025Updated last year
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
nervjack2 / Speech2Unit
View on GitHub
☆13Sep 25, 2024Updated last year
jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
Alexander-H-Liu / dinosr
View on GitHub
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
☆53Jan 18, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookresearch / MAViL
View on GitHub
The repo host the code and model of MAViL.
☆45Jul 24, 2023Updated 3 years ago
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
ga642381 / SpeechPrompt-v2
View on GitHub
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
☆81Oct 19, 2023Updated 2 years ago
facebookresearch / av_hubert
View on GitHub
A self-supervised learning framework for audio-visual speech
☆995Dec 7, 2023Updated 2 years ago
YuanGongND / uavm
View on GitHub
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 3 years ago
yistLin / beamertheme-yeast
View on GitHub
Yeast, a lite and light beamer theme
☆18Dec 6, 2020Updated 5 years ago
ckyang1124 / SAKURA
View on GitHub
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…
☆25Aug 14, 2025Updated 11 months ago
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
praveena2j / Cross-Attentional-AV-Fusion
View on GitHub
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition
☆34Nov 29, 2024Updated last year
ga642381 / Taiwanese-Translation
View on GitHub
Taiwanese Translation with BERT based model and RNN. Collection of Taiwanese text corpus
☆13Oct 15, 2022Updated 3 years ago
ga642381 / AudioCodec-Hub
View on GitHub
AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models
☆25Sep 26, 2023Updated 2 years ago
nobel861017 / Conv-TasNet
View on GitHub
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permuta…
☆11Aug 8, 2020Updated 5 years ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
jeffeuxMartin / meta-learning-hlp
View on GitHub
A publishing website of a table collecting meta-learning-related papers in the area of human language processing.
☆17Aug 2, 2021Updated 4 years ago
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
YuanGongND / cav-mae
View on GitHub
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
☆292Mar 20, 2024Updated 2 years ago
dynamic-superb / multimodal-llama
View on GitHub
The official implementation of ImageBind-LLM and Whisper-LLM from the paper "Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Compre…
☆21Oct 30, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
nervjack2 / MelHuBERT
View on GitHub
Official implementation of MelHuBERT
☆70Feb 21, 2026Updated 5 months ago
ga642381 / Taiwanese-Speech-Synthesis
View on GitHub
Taiwanese Speech Synthesis with Tacotron2
☆26Oct 2, 2022Updated 3 years ago
ChanganVR / action2sound
View on GitHub
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆26Oct 1, 2024Updated last year
DanielMengLiu / AudioVisualLip
View on GitHub
☆25Feb 20, 2024Updated 2 years ago
cmpute / audio-codec-benchmark
View on GitHub
Comprehensive quantitative comparison of lossless and lossy audio codecs
☆41Feb 11, 2023Updated 3 years ago
cyhuang-tw / robust-vc
View on GitHub
☆11May 7, 2022Updated 4 years ago