ServiceNow/AU-Harness

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ServiceNow/AU-Harness)

ServiceNow / AU-Harness

A comprehensive framework to test audio comprehension of Large Audio Language Models.

☆67

Alternatives and similar repositories for AU-Harness

Users that are interested in AU-Harness are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ServiceNow / eva
View on GitHub
A New End-to-end Framework for Evaluating Voice Agents
☆186Updated this week
b-sigpro / sed-hsmm
View on GitHub
Onset-and-Offset-Aware Sound Event Detection
☆21Feb 10, 2025Updated last year
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
wdqqdw / Echo
View on GitHub
Project page of "2026-ICLR Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning"
☆16Mar 26, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MRSAudio / MRSAudio_Main
View on GitHub
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
☆43Oct 15, 2025Updated 9 months ago
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
nttrd-mdlab / wearable-seld-dataset
View on GitHub
☆10Feb 18, 2022Updated 4 years ago
sarulab-speech / SpatialCLAP
View on GitHub
☆19Oct 9, 2025Updated 9 months ago
EIT-NLP / LLaSO
View on GitHub
☆116Oct 21, 2025Updated 9 months ago
yu-haoyuan / fd-badcat
View on GitHub
fd-sds
☆20Apr 8, 2026Updated 3 months ago
XiaomiMiMo / MiMo-Audio-Eval
View on GitHub
☆88Jun 17, 2026Updated last month
google-research / mseb
View on GitHub
☆63Jul 10, 2026Updated 2 weeks ago
MatthewCYM / VoiceBench
View on GitHub
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
☆378Jun 11, 2026Updated last month
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Orlllem / seld_wav2vec2
View on GitHub
☆18Feb 1, 2026Updated 5 months ago
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
koudounasalkis / voc2vec
View on GitHub
This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.
☆57Apr 14, 2025Updated last year
JethroWangSir / SincQDR-VAD
View on GitHub
☆26Aug 29, 2025Updated 10 months ago
SonyResearch / dcase2025_stereo_seld_data_generator
View on GitHub
Data generator for stereo sound event localization and detection task of DCASE 2025 challenge
☆17Jul 17, 2025Updated last year
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
dreamtheater123 / VoxEval
View on GitHub
Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
☆24Jun 16, 2025Updated last year
KeiKinn / ParaCLAP
View on GitHub
Towards a general language-audio model for computational paralinguistic tasks
☆30Dec 14, 2024Updated last year
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆140Feb 4, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LVYUERLVR / OutboundEval-Xbench
View on GitHub
OutboundEval, a comprehensive benchmark for evaluating large language models (LLMs) in expert-level intelligent outbound calling scenario…
☆17Oct 28, 2025Updated 8 months ago
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
ina-foss / InaGVAD
View on GitHub
Voice activity detection and speaker gender segmentation audiovisual corpus
☆16Jan 20, 2025Updated last year
audiosae / audio-sae
View on GitHub
Demo for AudioSAE paper
☆15Apr 26, 2026Updated 2 months ago
mileskuo42 / AudioMarkBench
View on GitHub
Dataset/code for AudioMarkBench: Benchmarking Robustness of Audio Watermarking
☆48Aug 23, 2024Updated last year
IiuZiKai / Evo_TSE
View on GitHub
☆17Apr 9, 2026Updated 3 months ago
Helw150 / levanter
View on GitHub
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆16Jun 16, 2024Updated 2 years ago
GLJS / AudioToolAgent
View on GitHub
GitHub repository for AudioToolAgent
☆20Feb 13, 2026Updated 5 months ago
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DabDans / AudioMarathon
View on GitHub
Code for "AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs"
☆26Oct 9, 2025Updated 9 months ago
PeiwenSun2000 / Both-Ears-Wide-Open
View on GitHub
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
☆65Jul 2, 2025Updated last year
wavlab-speech / cmu_multilingual_speech
View on GitHub
CMU multilingual speech repository
☆30Apr 15, 2022Updated 4 years ago
AudenAI / Auden
View on GitHub
☆71Apr 2, 2026Updated 3 months ago
kaistmm / seed-pytorch
View on GitHub
[INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"
☆59Nov 3, 2025Updated 8 months ago
LAION-AI / scaled-echo-tts
View on GitHub
Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024…
☆24Mar 29, 2026Updated 3 months ago
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago