microsoft / NoAudioCaptioning
Repository for "Training Audio Captioning Models without Audio"
☆9Updated last year
Alternatives and similar repositories for NoAudioCaptioning:
Users that are interested in NoAudioCaptioning are comparing it to the libraries listed below
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆38Updated 4 months ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆36Updated last year
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated last year
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆22Updated 10 months ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆31Updated last year
- (Hybrid) BYOL-S feature extractor using serab-byols package in pytorch.☆27Updated 9 months ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Updated last year
- Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech☆10Updated last year
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆33Updated last year
- ☆13Updated 2 years ago
- This is the implementation our Interspeech 2022 paper " Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conv…☆18Updated last year
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin …☆48Updated last year
- CDER (Conversational Diarization Error Rate) Scoring Tool☆17Updated 2 years ago
- ☆23Updated 7 months ago
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆15Updated 3 months ago
- Both audio-only and audio-visual speaker diarization datasets are listed here.☆11Updated last year
- ☆31Updated 9 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆47Updated 3 weeks ago
- ☆35Updated 4 months ago
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆17Updated 2 years ago
- Training code and trained checkpoints for ASGAN.☆62Updated last year
- SAMO: SPEAKER ATTRACTOR MULTI-CENTER ONE-CLASS LEARNING FOR VOICE ANTI-SPOOFING☆37Updated last year
- ☆12Updated last week
- Error correction back-end for speaker diarization☆15Updated last year
- ☆18Updated 8 months ago
- Code for the paper "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆50Updated this week
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆32Updated 7 months ago
- ☆28Updated last year
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 7 months ago
- Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.☆87Updated 2 years ago