Ydkwim/CTAL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ydkwim/CTAL)

Ydkwim / CTAL

Pre-training Cross-modal Transformer for Audio-and-Language Representations

☆39

Alternatives and similar repositories for CTAL

Users that are interested in CTAL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TehreemFarooqi / Preparing-a-speech-recognition-dataset-using-YouTube-videos
View on GitHub
Using YouTube to prepare a speech recognition dataset for any language
☆10Mar 30, 2021Updated 5 years ago
zerospeech / zerospeech2021_baseline
View on GitHub
BERT and LSTM baseline models of the ZeroSpeech Challenge 2021
☆60Oct 19, 2022Updated 3 years ago
VITA-Group / Audio-Lottery
View on GitHub
[ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…
☆32Apr 8, 2022Updated 4 years ago
babua / TTSDatasetRecorder
View on GitHub
A simple app for recording speech datasets.
☆26Jun 27, 2022Updated 4 years ago
AlanBaade / MAE-AST-Public
View on GitHub
Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
☆93Jun 9, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
awslabs / speech-representations
View on GitHub
Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
☆104Nov 26, 2022Updated 3 years ago
wentaozhu / speechnas
View on GitHub
SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification
☆30Mar 24, 2023Updated 3 years ago
SpeechColab / PySpeechColab
View on GitHub
A library of speech gadgets.
☆15Oct 15, 2022Updated 3 years ago
CMsmartvoice / Unet-TTS
View on GitHub
One-shot TTS with Improved Unseen Speaker and Style Transfer
☆37Mar 2, 2022Updated 4 years ago
savasy / TC32
View on GitHub
Text Classification Dataset for Turkish Language
☆10Nov 16, 2021Updated 4 years ago
ErikEkstedt / conv_ssl
View on GitHub
☆14Feb 9, 2023Updated 3 years ago
samuelyu2002 / PACS
View on GitHub
Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)
☆18Dec 20, 2022Updated 3 years ago
desh2608 / kaldi-noise-vectors
View on GitHub
Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.
☆13Feb 13, 2021Updated 5 years ago
Open-Speech-EkStep / crowdsource-dataplatform
View on GitHub
This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…
☆17Mar 6, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
idiap / bert-text-diarization-atc
View on GitHub
This is a repository for a paper accepted at the 2022 IEEE Spoken Language Technology Workshop (SLT 2022)
☆17Dec 1, 2022Updated 3 years ago
drscotthawley / fad_pytorch
View on GitHub
Frechet Audio Distance evaluation in PyTorch
☆36Jun 9, 2023Updated 3 years ago
leto19 / WhiSQA
View on GitHub
Whisper Speech Quality Assessment (WhiSQA)
☆16Apr 14, 2026Updated 3 months ago
bepierre / SpeechVGG
View on GitHub
Feature extractor for DL speech processing.
☆66Apr 13, 2022Updated 4 years ago
patrickvonplaten / Wav2Vec2_ParlanceCTCDecode
View on GitHub
☆11Nov 5, 2021Updated 4 years ago
MiuLab / Lattice-Transformer-SLU
View on GitHub
Source code for ASRU 2019 paper "Adapting Pretrained Transformer to Lattices for Spoken Language Understanding"
☆10Jul 8, 2020Updated 6 years ago
monatis / asr-annotation-bot
View on GitHub
Simple Telegram bot to annotate and varify automatic speech recognition datasets
☆12Mar 30, 2021Updated 5 years ago
bagustris / dimensional-ser
View on GitHub
Repository for my paper: Dimensional Speech Emotion Recognition Using Acoustic Features and Word Embeddings using Multitask Learning
☆17Aug 2, 2024Updated last year
soupdtag / speak-tool
View on GitHub
A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…
☆16Dec 19, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
visipedia / ssw60
View on GitHub
Sapsucker Woods 60 Audiovisual Dataset
☆19Oct 7, 2022Updated 3 years ago
vectominist / MiniASR
View on GitHub
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
☆53Dec 6, 2022Updated 3 years ago
avi33 / universalmelgan
View on GitHub
This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631
☆23Aug 15, 2022Updated 3 years ago
awasthiabhijeet / Error-Driven-ASR-Personalization
View on GitHub
Code for "Error-driven Fixed-Budget ASR Personalization for Accented Speakers" in ICASSP 2021
☆11Jun 13, 2021Updated 5 years ago
ga642381 / RobustVC
View on GitHub
**ICASSP 2022** 《Toward Degradation-Robust Voice Conversion》Using speech enhancement and end-to-end denoising training to improve degrada…
☆24Sep 27, 2022Updated 3 years ago
lstappen / MuSe2020
View on GitHub
Accompany code to reproduce the baselines of the International Multimodal Sentiment Analysis Challenge (MuSe 2020).
☆16Dec 8, 2022Updated 3 years ago
YuanGongND / psla
View on GitHub
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
☆150Jul 13, 2023Updated 3 years ago
introlab / uimvdr
View on GitHub
☆13Oct 11, 2024Updated last year
monatis / trfaq
View on GitHub
Question and answer retrieval in Turkish with BERT
☆14Nov 30, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
boschresearch / adversarial_meta_embeddings
View on GitHub
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"
☆13Dec 14, 2021Updated 4 years ago
fanlu / wenet
View on GitHub
Transformer based ASR Engine.
☆13Aug 23, 2021Updated 4 years ago
nttcslab / msm-mae
View on GitHub
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
☆99Feb 20, 2026Updated 5 months ago
wangyu / rethink-audio-fsl
View on GitHub
Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)
☆43May 24, 2022Updated 4 years ago
zhaojw1998 / Query-and-reArrange
View on GitHub
Code and demo for paper: Zhao et al., "Q&A: Query-Based Representation Learning for Multi-Track Symbolic Music re-Arrangement," IJCAI 202…
☆21May 2, 2024Updated 2 years ago
lijuncheng16 / AudioTaggingDoneRight
View on GitHub
experiments about AudioSet
☆43Jul 22, 2023Updated 3 years ago
talhanai / kaldi-diar-latte
View on GitHub
steps to perform text-based speaker diarization with kaldi toolkit
☆12Nov 2, 2018Updated 7 years ago