X-LANCE/public_talks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/X-LANCE/public_talks)

X-LANCE / public_talks

Materials of public talks given By SJTU X-LANCE members

☆14

Alternatives and similar repositories for public_talks

Users that are interested in public_talks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 3 years ago
HuangZiliAndy / SSL_for_multitalker
View on GitHub
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
☆33Mar 16, 2023Updated 3 years ago
RicherMans / PSL
View on GitHub
Source code for ICASSP2022 "Pseudo Strong labels for large scale weakly supervised audio tagging"
☆31Apr 29, 2022Updated 4 years ago
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
cpdu / unicats
View on GitHub
☆63Jan 15, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
X-LANCE / VQTalker
View on GitHub
[AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
☆53Dec 16, 2024Updated last year
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
xiaomi-research / acavcaps
View on GitHub
☆31Mar 27, 2026Updated 3 months ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
RicherMans / GPV
View on GitHub
Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper
☆141Aug 3, 2023Updated 2 years ago
mutiann / speech_rankings
View on GitHub
A CSRankings-like index for speech researchers
☆35Oct 16, 2024Updated last year
X-LANCE / KWStreamingSearch
View on GitHub
☆94Jun 25, 2025Updated last year
azraelkuan / repgan
View on GitHub
RepVgg + HiFiGAN
☆36Aug 10, 2022Updated 3 years ago
Speech-Lab-IITM / data2vec-aqc
View on GitHub
Repository having the code and models from the paper: data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student traini…
☆13Mar 18, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
awthomp / cusignal-icassp-tutorial
View on GitHub
4 Hour cuSignal Tutorial - ICASSP 2021 Notebooks
☆49Jun 7, 2021Updated 5 years ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
X-LANCE / BER
View on GitHub
Balanced Error Rate for Speaker Diarization
☆32Feb 28, 2023Updated 3 years ago
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
wenet-e2e / wesep
View on GitHub
Target Speaker Extraction Toolkit
☆299Oct 4, 2025Updated 9 months ago
csukuangfj / transducer-loss-benchmarking
View on GitHub
☆67Mar 25, 2022Updated 4 years ago
RicherMans / Datadriven-GPVAD
View on GitHub
The codebase for Data-driven general-purpose voice activity detection.
☆93Aug 3, 2023Updated 2 years ago
1ytic / edit-distance-papers
View on GitHub
A curated list of papers dedicated to edit-distance as objective function
☆53Aug 22, 2020Updated 5 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
X-LANCE / VoiceFlow-TTS
View on GitHub
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
☆376Sep 3, 2024Updated last year
HY-SpongeBob / HY-SpongeBob
View on GitHub
☆26May 26, 2026Updated 2 months ago
facebookresearch / AudioDec
View on GitHub
An Open-source Streaming High-fidelity Neural Audio Codec
☆510Mar 4, 2025Updated last year
faroit / python_audio_loading_benchmark
View on GitHub
Benchmark popular audio i/o packages
☆152Dec 19, 2023Updated 2 years ago
zjlww / papers
View on GitHub
Connected Papers knockoff, managing academic papers and citations with graph database.
☆12Dec 26, 2023Updated 2 years ago
RicherMans / AudioCaption
View on GitHub
Dataset and baseline for the first Audiocaption task
☆79Jul 25, 2024Updated 2 years ago
jimbozhang / xares
View on GitHub
A benchmark for evaluating audio encoders on various audio tasks.
☆55Apr 27, 2026Updated 2 months ago
noiseux1523 / NIST-SRE-2019
View on GitHub
Score Normalization for NIST 2019 Speaker Recognition Evaluation
☆10Nov 8, 2019Updated 6 years ago
XiaoMi / dasheng
View on GitHub
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
☆200Nov 7, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
a-nagrani / VoxSRC2020
View on GitHub
Development Toolkit for the VoxCeleb Speaker Recognition Challenge 2020
☆43Jul 17, 2020Updated 6 years ago
Ruiqi-Yan / Awesome-Audio-Editing
View on GitHub
A curated list of models, benchmarks, tools and guides for audio editing
☆34Jul 7, 2026Updated 2 weeks ago
ohlionel / Prune-Tune
View on GitHub
Official code repository for AAAI2021 paper Finding Sparse Structures for Domain Specific Neural Machine Translation
☆11Apr 1, 2021Updated 5 years ago
Xflick / EEND_PyTorch
View on GitHub
A PyTorch implementation of End-to-End Neural Diarization
☆110Jun 19, 2023Updated 3 years ago
RicherMans / SAT
View on GitHub
Streaming Audiotransformers for online Audio tagging
☆57Jun 14, 2024Updated 2 years ago
WowCZ / LongMIT
View on GitHub
LongMIT: Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets
☆43Sep 30, 2024Updated last year
whull / end2end_ASR
View on GitHub
端到端语音识别实现；包含LAS、CTC、RNNT解码方式，模型SA(MHA)、LSTM、CNN、DFSMN等
☆15Jun 4, 2021Updated 5 years ago