an-tran528/wavetransformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/an-tran528/wavetransformer)

an-tran528 / wavetransformer

Code base for WaveTransformer: A novel architecture for automated audio captioning

☆43

Alternatives and similar repositories for wavetransformer

Users that are interested in wavetransformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

audio-captioning / audio-captioning-papers
View on GitHub
A list of papers about audio captioning
☆78Jul 1, 2022Updated 4 years ago
audio-captioning / dcase-2020-baseline
View on GitHub
Audio captioning baseline system for DCASE 2020 challenge.
☆38Aug 22, 2023Updated 2 years ago
ws-choi / AMSS-Net
View on GitHub
A PyTorch implementation of the paper: "AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries" (ACM Multimedia 2021…
☆21Jul 4, 2021Updated 5 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
vibertthio / musical-ml-web-demo-minimal-template
View on GitHub
An template that is specifically designed to demonstrate symbolic musical machine learning models on the web. The template comes with a s…
☆19Feb 26, 2019Updated 7 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
wilkinghoff / dcase2022
View on GitHub
Submission for task 2 "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques"…
☆16Sep 19, 2022Updated 3 years ago
audio-captioning / audio-captioning-resources
View on GitHub
A list of resources that can help in research for automated audio captioning
☆34Feb 17, 2021Updated 5 years ago
bill317996 / Singer-identification-in-artist20
View on GitHub
Addressing the confounds of accompaniments in singer identification
☆18Mar 24, 2020Updated 6 years ago
shanguanma / Aligners
View on GitHub
HMM, CTC, RNN-Transducer, forward-backward algorithm
☆20Sep 5, 2023Updated 2 years ago
YeongHyeon / Skip-GANomaly
View on GitHub
Implementation of Skip-GANomaly with MNIST dataset
☆11Nov 28, 2019Updated 6 years ago
vadimkantorov / inferspeech
View on GitHub
PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant
☆10Aug 12, 2019Updated 6 years ago
ta012 / DTFAT
View on GitHub
[AAAI 2024] DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
☆12Mar 10, 2025Updated last year
jblsmith / loopextractor
View on GitHub
A python script for extracting loops from audio files.
☆54Jul 26, 2024Updated 2 years ago
aframires / freesound-loop-annotator
View on GitHub
A web app for annotating Freesound loops, and the tools to analyse the dataset created.
☆20Jul 6, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
WangHelin1997 / AT-GCN
View on GitHub
Pytorch implementation of the paper : Modeling Label Dependencies for Audio Tagging with Graph Convolutional Network
☆15Sep 18, 2020Updated 5 years ago
tqbl / arca23k-dataset
View on GitHub
The code used to create the ARCA23K and ARCA23K-FSD datasets
☆16Nov 9, 2021Updated 4 years ago
reichang182 / variable-length-piano-infilling
View on GitHub
The official implementation of Variable-Length Piano Infilling (VLI).
☆35Aug 31, 2021Updated 4 years ago
SpeechColab / PySpeechColab
View on GitHub
A library of speech gadgets.
☆15Oct 15, 2022Updated 3 years ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
haoheliu / diffres-python
View on GitHub
Learning differentiable temporal resolution on time-series data.
☆36Nov 12, 2022Updated 3 years ago
DCASE-REPO / dcase_util
View on GitHub
A collection of utilities for Detection and Classification of Acoustic Scenes and Events
☆135Apr 3, 2025Updated last year
placebokkk / ctc-asr
View on GitHub
pytorch CTC implementation for ASR. Use eesen's fst decoder framework
☆10Feb 27, 2020Updated 6 years ago
liuxubo717 / cl4ac
View on GitHub
Code for "CL4AC: A Contrastive Loss for Audio Captioning", DCASE Workshop 2021.
☆45Oct 8, 2021Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
emirdemirel / DALI-TestSet4ALT
View on GitHub
This is a subset of the DALI set consisting of 240 polyphonic recordings that is used to benchmark lyrics transcription evaluation.
☆12Nov 30, 2021Updated 4 years ago
syedajannatulferdous121 / transformer
View on GitHub
The MATLAB code implements a Transformer model, a recent innovation in deep neural networks. It includes modules for multi-head attention…
☆11Jul 5, 2023Updated 3 years ago
ashishpatel26 / Audio-Masking-Methods
View on GitHub
Audio Masking Methods
☆12Nov 15, 2019Updated 6 years ago
kevinco27 / attentional-similarity
View on GitHub
Pytorch implementation of [Learning to match transient sound events using attentional similarity for few-shot sound recognition]
☆33Feb 27, 2019Updated 7 years ago
parth2170 / DCASE2020-Task2
View on GitHub
☆14Jun 18, 2020Updated 6 years ago
gefleury / datascientest_anomalous_sounds
View on GitHub
Anomalous sound detection with machine learning and deep learning
☆14Jun 24, 2024Updated 2 years ago
KeisukeImoto / RWCPSSD_Onomatopoeia
View on GitHub
RWCP-SSD-Onomatopoeia
☆24Jun 28, 2023Updated 3 years ago
CVSSP / perceptual-study-source-separation
View on GitHub
Repository for subjective and objective evaluation of source separation algorithms
☆12Apr 18, 2018Updated 8 years ago
looking-for-my-magic-bean / DCASE2020-TASK2-semi-VAE
View on GitHub
☆10Jun 20, 2020Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cyfer0618 / kaldi-pytorch-rnnlm
View on GitHub
Enable RNNLM lattice rescoring with Pytorch [kaldi]
☆12Jun 5, 2020Updated 6 years ago
vadimkantorov / readaudio
View on GitHub
Read audio with FFmpeg into NumPy/PyTorch via ctypes (standard library module)
☆11Aug 12, 2020Updated 5 years ago
JozefColdenhoff / OpenACE
View on GitHub
☆11Aug 1, 2025Updated 11 months ago
MTG / DCASE-models
View on GitHub
Python library for rapid prototyping of environmental sound analysis systems
☆44May 20, 2022Updated 4 years ago
Abimbola-ai / Oil-and-gas-pipeline-leakage
View on GitHub
☆19Dec 9, 2020Updated 5 years ago
csukuangfj / kaldilm
View on GitHub
Python wrapper for kaldi's arpa2fst
☆38Aug 27, 2025Updated 11 months ago
csteinmetz1 / pyloudnorm-eval
View on GitHub
Evaluation of a number of loudness meter implementations
☆13Aug 28, 2021Updated 4 years ago