mingu6 / declarativedtw
Reference implementation of DecDTW in PyTorch (ICLR 2023)
☆19Updated last year
Related projects: ⓘ
- ☆14Updated last year
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆28Updated last year
- Audio propagation engine - Meta Reality Labs Research.☆16Updated last year
- The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training☆39Updated last year
- A spoken version of the textual story cloze benchmark☆12Updated last year
- ☆13Updated 2 years ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆29Updated 2 years ago
- The demo for "Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem".☆12Updated 2 years ago
- Accompanying code for our paper "Optimizing Short-Time Fourier Transform Parameters via Gradient Descent"☆31Updated 3 years ago
- A Pytorch Implementations for Various Vector Quantization Methods☆25Updated 3 years ago
- Official PyTorch implementation for "Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations".☆20Updated 2 years ago
- Musical Word Embedding for Music Tagging and Retrieval [IEEE TASLP]☆21Updated 4 months ago
- [TOMM 2024] Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing☆16Updated 3 weeks ago
- ISMIR 24 Supplementary Material☆10Updated 4 months ago
- Relative Positional Encoding for Transformers with Linear Complexity☆61Updated 2 years ago
- GroupMap: beyond mean and variance matching for deep learning☆10Updated last year
- Test-time adaptation for speech recognition model by single utterance. The official implementation of "Listen, Adapt, Better WER: Source-…☆15Updated 2 years ago
- Codes for paper <InteL-VAEs: Adding Inductive Biases to VariationalAuto-Encoders via Intermediary Latents>.☆18Updated 3 years ago
- SRTNet☆24Updated last year
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆26Updated last year
- ☆29Updated 8 months ago
- Project website for "Telling left from right: Learning spatial correspondence between sight and sound"☆20Updated 2 years ago
- Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling☆15Updated 11 months ago
- ☆17Updated 3 years ago
- Repo for Visual Acoustic Matching, CVPR 2022☆60Updated last year
- Jax/Flax implementation of Variational-DiffWave.☆40Updated 2 years ago
- This repository contains the baseline system for CHiME-8 MMCSG challenge focusing on transcribing both sides of a conversation where one …☆26Updated 6 months ago
- Residual Quantization with Implicit Neural Codebooks☆44Updated last month
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆76Updated last year
- Source code for the paper 'Audio Captioning Transformer'☆47Updated 2 years ago