Orlllem/seld_wav2vec2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Orlllem/seld_wav2vec2)

Orlllem / seld_wav2vec2

☆18

Alternatives and similar repositories for seld_wav2vec2

Users that are interested in seld_wav2vec2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dberghi / AV-SELD
View on GitHub
Python implementation of the paper "Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection"
☆31Apr 26, 2024Updated 2 years ago
nttrd-mdlab / wearable-seld-dataset
View on GitHub
☆10Feb 18, 2022Updated 4 years ago
muuda / MFF-EINV2
View on GitHub
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
☆22Jul 17, 2024Updated 2 years ago
Jinbo-Hu / SELD-Data-Generator
View on GitHub
Data generator for sound event localization and detection clips, including 4-ch microphone-array-format signals and first-order-ambisonic…
☆22Nov 13, 2024Updated last year
yusunnny / CST-former
View on GitHub
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection (ICASSP 2024)
☆39May 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
danielkrause / DCASE2022-data-generator
View on GitHub
Data generator for creating synthetic audio mixtures suitable for DCASE Challenge 2022 Task 3
☆47Apr 5, 2023Updated 3 years ago
Jinbo-Hu / PSELDNets
View on GitHub
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
☆47Sep 17, 2025Updated 10 months ago
MRSAudio / MRSAudio_Main
View on GitHub
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
☆43Oct 15, 2025Updated 9 months ago
PeiwenSun2000 / Both-Ears-Wide-Open
View on GitHub
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
☆65Jul 2, 2025Updated last year
jin-woo-lee / nfs-binaural
View on GitHub
☆13Aug 13, 2023Updated 2 years ago
sadPororo / AD-YOLO
View on GitHub
AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection, IEEE ICASSP 2023
☆35Dec 21, 2025Updated 7 months ago
partha2409 / DCASE2024_seld_baseline
View on GitHub
☆52Dec 13, 2025Updated 7 months ago
Audio-WestlakeU / SAR-SSL
View on GitHub
A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Mult…
☆40Oct 11, 2024Updated last year
sxxmason / FGANomaly
View on GitHub
Implementation of FGANomaly
☆17Sep 22, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Yhonatangayer / shroom
View on GitHub
Spherical Harmonics ROOM, an open-source Python library for room acoustics simulation using Ambisonics, https://arxiv.org/abs/2603.27342,…
☆19Jul 12, 2026Updated last week
htqin / BiFSMN
View on GitHub
Pytorch implementation of BiFSMN, IJCAI 2022
☆22Feb 10, 2023Updated 3 years ago
nttcslab / dcase2025_task4_baseline
View on GitHub
☆18Apr 16, 2026Updated 3 months ago
kinggongzilla / DCASE2023_Task2
View on GitHub
☆23May 15, 2023Updated 3 years ago
michaelneri / audio-distance-estimation
View on GitHub
Official repository of the work "Speaker Distance Estimation in Enclosures from Single-Channel Audio" published to IEEE/ACM Transactions …
☆40Jun 29, 2026Updated 3 weeks ago
michaelneri / unsupervised-audio-anomaly-detection
View on GitHub
Official repository of the work "Low-complexity Unsupervised Audio Anomaly Detection exploiting Separable Convolutions and Angular Loss" …
☆11Nov 6, 2024Updated last year
b-sigpro / sed-hsmm
View on GitHub
Onset-and-Offset-Aware Sound Event Detection
☆21Feb 10, 2025Updated last year
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 3 years ago
zszheng147 / Spatial-AST
View on GitHub
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
☆87Feb 13, 2025Updated last year
Annmixiu / MTANet
View on GitHub
INTERSPEECH2023: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music
☆31May 27, 2024Updated 2 years ago
soonhyeon / Noisy-ArcMix
View on GitHub
Noisy-ArcMix: Additive Noisy Angular Margin Loss Combined With Mixup for Anomalous Sound Detection
☆31Aug 22, 2024Updated last year
marl / SpatialScaper
View on GitHub
☆75Aug 7, 2025Updated 11 months ago
Tencent / StableToken
View on GitHub
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
☆33Feb 27, 2026Updated 4 months ago
sarulab-speech / SpatialCLAP
View on GitHub
☆19Oct 9, 2025Updated 9 months ago
kyamauchi1023 / PL-BERT-ja
View on GitHub
A repository of Japanese Phoneme-Level BERT
☆24Dec 16, 2023Updated 2 years ago
danielkrause / Moving-Binaural-SDEL
View on GitHub
Implementation of the paper "Binaural Sound Source Distance Estimation and Localization for a Moving Listener"
☆22Mar 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
wilkinghoff / DSpAST
View on GitHub
Code for the paper "DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models"
☆17Oct 23, 2025Updated 9 months ago
merlresearch / reverberation-as-supervision
View on GitHub
Enhanced Reverberation As Supervision (ERAS) for unsupervised reverberant speech separation
☆15Aug 1, 2024Updated last year
ArrayDPS / ArrayDPS
View on GitHub
☆40May 12, 2025Updated last year
korakoe / VALL-E-X
View on GitHub
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
☆16Apr 18, 2024Updated 2 years ago
sh01k / AmplitudeMatching
View on GitHub
A multizone sound field control method to synthesize a desired amplitude (or magnitude) distributions over a target region with multiple …
☆15Mar 30, 2023Updated 3 years ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
DabDans / AudioMarathon
View on GitHub
Code for "AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs"
☆26Oct 9, 2025Updated 9 months ago