[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation
☆41Sep 1, 2023Updated 2 years ago
Alternatives and similar repositories for CIF-HieraDist
Users that are interested in CIF-HieraDist are comparing it to the libraries listed below
Sorting:
- [ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-…☆80Jan 9, 2025Updated last year
- [ICASSP 2022] Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection☆25May 18, 2023Updated 2 years ago
- ☆16Nov 9, 2023Updated 2 years ago
- ICASSP 2023: "Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition"☆14Nov 29, 2024Updated last year
- Conformer RNN-Transducer☆14May 25, 2022Updated 3 years ago
- Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning☆48Nov 8, 2023Updated 2 years ago
- NAR-BERT-ASR☆10Sep 27, 2021Updated 4 years ago
- ☆23Updated this week
- ☆14Nov 26, 2024Updated last year
- This setup allows to train end-to-end neural models for spoken language understanding (SLU).☆11Jun 12, 2023Updated 2 years ago
- ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level…☆25Dec 4, 2024Updated last year
- A curated list of awesome papers on contextualizing E2E ASR outputs☆80May 10, 2023Updated 2 years ago
- Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5☆19Nov 29, 2022Updated 3 years ago
- E2E system with LF-MMI; word N-gram for Mandarin☆166Apr 29, 2022Updated 3 years ago
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆22Apr 27, 2024Updated last year
- The official repo of "WhiStress: Enriching Transcriptions with Sentence Stress Detection" (Interspeech 2025)☆36Jul 24, 2025Updated 7 months ago
- ☆25Apr 16, 2025Updated 10 months ago
- End-to-End Speech Processing Toolkit☆15Jan 20, 2025Updated last year
- Implementation of the contextual biasing for ASR decoding on GPUs without lattice generation. The code supports submission to Interspeech…☆21Sep 25, 2023Updated 2 years ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition☆81Mar 12, 2024Updated last year
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆25May 18, 2023Updated 2 years ago
- Data and code related to the ICASSP submission "A comparison of methods for OOV-word recognition"☆17Nov 28, 2021Updated 4 years ago
- PyTorch implementation of WASE described in our ICASSP 2021: "Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Envi…☆26Jan 11, 2022Updated 4 years ago
- "MULTIMODAL EMOTION RECOGNITION BASED ON DEEP TEMPORAL FEATURES USING CROSS-MODAL TRANSFORMER AND SELF-ATTENTION" ICASSP'23☆23Feb 26, 2023Updated 3 years ago
- Official implementation of the paper "Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Unce…☆27Mar 13, 2025Updated 11 months ago
- Repo for the FB AI Speech team.☆25Aug 24, 2021Updated 4 years ago
- [EMNLP2023] Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction☆62Jul 8, 2024Updated last year
- The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".☆30Aug 2, 2025Updated 7 months ago
- This repository is the official implementation of unimodal aggregation (UMA) for automaticspeech recognition (ASR).☆35Dec 17, 2024Updated last year
- INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"☆117Jan 26, 2024Updated 2 years ago
- Scientific Reports - Open access - Published: 14 February 2025☆49Oct 30, 2024Updated last year
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)☆29Jul 9, 2024Updated last year
- Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"☆26Jun 15, 2022Updated 3 years ago
- PyTorch re-implementation of Speech-Transformer☆102Nov 19, 2021Updated 4 years ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated 11 months ago
- A CRF-based ASR Toolkit☆364Feb 5, 2026Updated last month