Pre-training Cross-modal Transformer for Audio-and-Language Representations
☆39Apr 20, 2021Updated 5 years ago
Alternatives and similar repositories for CTAL
Users that are interested in CTAL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Using YouTube to prepare a speech recognition dataset for any language☆10Mar 30, 2021Updated 5 years ago
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆15Dec 19, 2022Updated 3 years ago
- BERT and LSTM baseline models of the ZeroSpeech Challenge 2021☆60Oct 19, 2022Updated 3 years ago
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 4 years ago
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆93Jun 9, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)☆104Nov 26, 2022Updated 3 years ago
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆18Dec 20, 2022Updated 3 years ago
- SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification☆30Mar 24, 2023Updated 3 years ago
- A library of speech gadgets.☆14Oct 15, 2022Updated 3 years ago
- One-shot TTS with Improved Unseen Speaker and Style Transfer☆37Mar 2, 2022Updated 4 years ago
- Text Classification Dataset for Turkish Language☆10Nov 16, 2021Updated 4 years ago
- ☆14Feb 9, 2023Updated 3 years ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆38Sep 9, 2025Updated 8 months ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Frechet Audio Distance evaluation in PyTorch☆36Jun 9, 2023Updated 2 years ago
- This is a repository for a paper accepted at the 2022 IEEE Spoken Language Technology Workshop (SLT 2022)☆17Dec 1, 2022Updated 3 years ago
- ☆11Nov 5, 2021Updated 4 years ago
- Feature extractor for DL speech processing.☆66Apr 13, 2022Updated 4 years ago
- Simple Telegram bot to annotate and varify automatic speech recognition datasets☆12Mar 30, 2021Updated 5 years ago
- Source code for ASRU 2019 paper "Adapting Pretrained Transformer to Lattices for Spoken Language Understanding"☆10Jul 8, 2020Updated 5 years ago
- This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…☆17Mar 6, 2023Updated 3 years ago
- **ICASSP 2022** 《Toward Degradation-Robust Voice Conversion》Using speech enhancement and end-to-end denoising training to improve degrada…☆24Sep 27, 2022Updated 3 years ago
- This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631☆23Aug 15, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.☆53Dec 6, 2022Updated 3 years ago
- Repository for my paper: Dimensional Speech Emotion Recognition Using Acoustic Features and Word Embeddings using Multitask Learning☆17Aug 2, 2024Updated last year
- Sapsucker Woods 60 Audiovisual Dataset☆18Oct 7, 2022Updated 3 years ago
- Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".☆150Jul 13, 2023Updated 2 years ago
- Accompany code to reproduce the baselines of the International Multimodal Sentiment Analysis Challenge (MuSe 2020).☆16Dec 8, 2022Updated 3 years ago
- ☆13Oct 11, 2024Updated last year
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- Question and answer retrieval in Turkish with BERT☆14Nov 30, 2021Updated 4 years ago
- Transformer based ASR Engine.☆13Aug 23, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.☆33Apr 22, 2026Updated last month
- [ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…☆24Feb 16, 2023Updated 3 years ago
- experiments about AudioSet☆43Jul 22, 2023Updated 2 years ago
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆99Feb 20, 2026Updated 3 months ago
- steps to perform text-based speaker diarization with kaldi toolkit☆12Nov 2, 2018Updated 7 years ago
- [ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach☆20Aug 2, 2021Updated 4 years ago
- PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.☆74Aug 3, 2021Updated 4 years ago