xjchenGit/MTDVocaLiST

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xjchenGit/MTDVocaLiST)

xjchenGit / MTDVocaLiST

Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).

☆29

Alternatives and similar repositories for MTDVocaLiST

Users that are interested in MTDVocaLiST are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vskadandale / vocalist
View on GitHub
Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
☆73Apr 7, 2024Updated 2 years ago
isjwdu / DFADD
View on GitHub
Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset
☆16Apr 7, 2025Updated last year
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
WWWWxp / Speech-Tokenizer-Papers
View on GitHub
This repository collects papers related to Speech Tokenizer.
☆18Oct 16, 2024Updated last year
lingjzhu / spoken_sent_embedding
View on GitHub
Unsupervised spoken sentence embeddings
☆14Dec 14, 2022Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
xjchenGit / SingGraph
View on GitHub
Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).
☆24Sep 19, 2025Updated 10 months ago
ag027592 / EMO-SUPERB
View on GitHub
EMO-SUPERB: a reproducible speech emotion recognition benchmark with leakage-free splits for 6 datasets and 15 speech SSL models (IEEE SL…
☆51Updated this week
xjchenGit / awesome-audio-visual-deepfake
View on GitHub
awesome-audio-visual-robustness
☆11Jan 27, 2024Updated 2 years ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
hieuthi / LlamaPartialSpoof
View on GitHub
A fully and partially fake speech dataset for evaluation
☆15Nov 11, 2025Updated 8 months ago
LingweiMeng / MyChatGPT
View on GitHub
A casual and simple ChatGPT Python script that can run using terminal (as long as you have an API). Support Azure API.
☆20May 3, 2025Updated last year
B06901052 / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆13Oct 11, 2022Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
jiaqili3 / DualCodec
View on GitHub
[Interspeech 2025] DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec
☆72Mar 11, 2026Updated 4 months ago
thepowerfuldeez / rvc-trainer
View on GitHub
☆12Mar 28, 2024Updated 2 years ago
shivammehta25 / Diff-TTSG
View on GitHub
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
☆40Sep 14, 2023Updated 2 years ago
geomachine / geomachine
View on GitHub
Config files for my GitHub profile.
☆11Updated this week
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
xieyuankun / FSD-Dataset
View on GitHub
This repository presents FSD dataset for song deepfake detection.
☆24Aug 18, 2025Updated 11 months ago
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
KylinYee / R2-Talker-code
View on GitHub
R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning
☆82Jan 3, 2024Updated 2 years ago
TrongChuongDao / nsacyber.github.io
View on GitHub
NSA Cybersecurity. Formerly known as NSA Information Assurance and the Information Assurance Directorate
☆10Jul 7, 2022Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
cylin-cmlab / GCT-Prediction
View on GitHub
This is the official supplementary document for the GCT data and its prediction task.
☆10Feb 19, 2024Updated 2 years ago
nttcslab-sp / mamba-diarization
View on GitHub
Official repository for Mamba-based Segmentation Model for Speaker Diarization
☆47May 13, 2025Updated last year
X-LANCE / LSCodec-Inference
View on GitHub
Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"
☆36Oct 23, 2025Updated 9 months ago
ga642381 / AudioCodec-Hub
View on GitHub
AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models
☆25Sep 26, 2023Updated 2 years ago
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
aniketp02 / wav2lip_144x144
View on GitHub
☆10Feb 17, 2023Updated 3 years ago
DavideGioiosa / cvae-chord-generation-complexity
View on GitHub
Modeling Harmonic Complexity using two models of Conditional Variational Autoencoders - MSc. Thesis
☆10May 16, 2023Updated 3 years ago
yongyizang / SingFake
View on GitHub
Official Repository for "SingFake: Singing Voice Deepfake Detection"
☆64Feb 26, 2024Updated 2 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ByronHsu / FlyteGPT
View on GitHub
🦅🔗 Building FlyteGPT on Flyte with LangChain
☆30Jan 23, 2024Updated 2 years ago
yzyouzhang / Audio_Research_in_US
View on GitHub
Audio Research in US. US-based professors who work on audio (music, speech, acoustics). For students who would like to apply for RA, PhD,…
☆27Feb 27, 2026Updated 5 months ago
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
BinWang28 / AnomalyHop
View on GitHub
IEEE VCIP 2021: AnomalyHop: An SSL-based Image Anomaly Localization Method
☆14Sep 18, 2021Updated 4 years ago
dreamtheater123 / VoxEval
View on GitHub
Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
☆24Jun 16, 2025Updated last year
jk4freedom / jk4freedom
View on GitHub
About Us
☆19Mar 30, 2024Updated 2 years ago
app-stories-integration / app-story-ui
View on GitHub
☆10Aug 23, 2021Updated 4 years ago