zszheng147/Spatial-AST

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zszheng147/Spatial-AST)

zszheng147 / Spatial-AST

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

☆87

Alternatives and similar repositories for Spatial-AST

Users that are interested in Spatial-AST are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MRSAudio / MRSAudio_Main
View on GitHub
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
☆43Oct 15, 2025Updated 9 months ago
wilkinghoff / DSpAST
View on GitHub
Code for the paper "DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models"
☆17Oct 23, 2025Updated 9 months ago
dieKarotte / ASAudio
View on GitHub
☆59Oct 19, 2025Updated 9 months ago
dieKarotte / Spatial-Omni
View on GitHub
☆28Jun 17, 2026Updated last month
BASHLab / OWL
View on GitHub
☆15May 25, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
InternLM / StarBench
View on GitHub
[ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"
☆42Apr 19, 2026Updated 3 months ago
shlizee / savvy
View on GitHub
Repository for SAVVY(Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing) Benchmark and SAVVY model
☆25May 30, 2026Updated last month
PeiwenSun2000 / Both-Ears-Wide-Open
View on GitHub
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
☆65Jul 2, 2025Updated last year
jaeyeonkim99 / visage
View on GitHub
Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)
☆47Sep 10, 2025Updated 10 months ago
facebookresearch / rlr-audio-propagation
View on GitHub
Audio propagation engine - Meta Reality Labs Research.
☆24Nov 1, 2022Updated 3 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
ta012 / SSLAM
View on GitHub
[ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
☆79Oct 8, 2025Updated 9 months ago
Audio-WestlakeU / SAR-SSL
View on GitHub
A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Mult…
☆40Oct 11, 2024Updated last year
danielkrause / Moving-Binaural-SDEL
View on GitHub
Implementation of the paper "Binaural Sound Source Distance Estimation and Localization for a Moving Listener"
☆22Mar 2, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
01Zhangbw / Speech-and-audio-papers-Top-Conference
View on GitHub
☆141Jan 24, 2026Updated 6 months ago
cwx-worst-one / EAT
View on GitHub
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
☆239Nov 30, 2025Updated 7 months ago
sharathadavanne / seld-dcase2023
View on GitHub
Baseline method for sound event localization task of DCASE 2023 challenge
☆71Mar 13, 2023Updated 3 years ago
Orlllem / seld_wav2vec2
View on GitHub
☆18Feb 1, 2026Updated 5 months ago
Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆21Jun 14, 2024Updated 2 years ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
Jinbo-Hu / SELD-Data-Generator
View on GitHub
Data generator for sound event localization and detection clips, including 4-ch microphone-array-format signals and first-order-ambisonic…
☆22Nov 13, 2024Updated last year
facebookresearch / sound-spaces
View on GitHub
A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple task…
☆468Sep 29, 2023Updated 2 years ago
dberghi / AV-SELD
View on GitHub
Python implementation of the paper "Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection"
☆31Apr 26, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xiquan-li / Awesome-Audio-Generation
View on GitHub
Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation
☆74Updated this week
Jinbo-Hu / PSELDNets
View on GitHub
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
☆47Sep 17, 2025Updated 10 months ago
danielkrause / DCASE2022-data-generator
View on GitHub
Data generator for creating synthetic audio mixtures suitable for DCASE Challenge 2022 Task 3
☆47Apr 5, 2023Updated 3 years ago
QxLabIreland / Binamix
View on GitHub
A Python Library for Binaural Mixing and Data Generation
☆56Jan 23, 2026Updated 6 months ago
DCASE2024-Task7-Sound-Scene-Synthesis / AudioLDM-training-finetuning
View on GitHub
AudioLDM training, finetuning, evaluation and inference.
☆14Mar 27, 2024Updated 2 years ago
ChanganVR / action2sound
View on GitHub
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆26Oct 1, 2024Updated last year
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,049Jan 15, 2026Updated 6 months ago
marl / SpatialScaper
View on GitHub
☆75Aug 7, 2025Updated 11 months ago
facebookresearch / real-acoustic-fields
View on GitHub
Real Acoustic Fields An Audio-Visual Room Acoustics Dataset and Benchmark
☆64Aug 29, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
whojavumusic / HARP
View on GitHub
HARP: A Large-Scale Higher-Order Ambisonic Room Impulse Response Dataset
☆35Jun 3, 2025Updated last year
XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated last year
partha2409 / DCASE2024_seld_baseline
View on GitHub
☆52Dec 13, 2025Updated 7 months ago
Audio-WestlakeU / FN-SSL
View on GitHub
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization [INTERSPEECH2023 & TASLP2024]
☆159Mar 10, 2026Updated 4 months ago
Xiaohao-Liu / Awesome-Vison2Audio
View on GitHub
A curated list of Vision (video/image) to Audio Generation
☆107Feb 10, 2026Updated 5 months ago
Labbeti / aac-metrics
View on GitHub
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
☆75Mar 22, 2026Updated 4 months ago
sharathadavanne / seld-dcase2022
View on GitHub
Baseline method for sound event localization task of DCASE 2022 challenge
☆64Jun 21, 2022Updated 4 years ago