kuan2jiu99/audio-hallucination

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kuan2jiu99/audio-hallucination)

kuan2jiu99 / audio-hallucination

Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024

☆34

Alternatives and similar repositories for audio-hallucination

Users that are interested in audio-hallucination are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
Sreyan88 / Disfluency-Detection-with-Span-Classification
View on GitHub
This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…
☆14Jun 6, 2023Updated 3 years ago
huaidanquede / Dense-TSNet
View on GitHub
offical code for Dense-TSNet
☆12Sep 17, 2024Updated last year
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
TuZehai / Sheffield_Clarity_CEC1_Entry
View on GitHub
Implementation of Sheffield entry for Clarity enhancement challenge.
☆18Apr 19, 2022Updated 4 years ago
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
ag1988 / mel-asr
View on GitHub
The accompanying code for "Exploring the limits of decoder-only models trained on public speech recognition corpora" (Ankit Gupta, George…
☆21Oct 11, 2024Updated last year
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
ga642381 / Taiwanese-Speech-Synthesis
View on GitHub
Taiwanese Speech Synthesis with Tacotron2
☆26Oct 2, 2022Updated 3 years ago
JSALT-2022-SSL / superb-prosody
View on GitHub
☆31Jul 13, 2023Updated 3 years ago
Sreyan88 / RECAP
View on GitHub
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
☆16Jun 23, 2024Updated 2 years ago
zeyuxie29 / AudioTime
View on GitHub
☆39Jul 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kamperh / vqwordseg
View on GitHub
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
☆39May 5, 2026Updated 2 months ago
soham97 / ADIFF
View on GitHub
Explaining audio differences using language
☆16Feb 11, 2025Updated last year
Yip-Jia-Qi / codecformer
View on GitHub
☆21Jul 15, 2024Updated 2 years ago
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
ddlBoJack / MT4SSL
View on GitHub
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…
☆45Mar 25, 2024Updated 2 years ago
navana-tech / baseline_recipe_is21s_indic_asr_challenge
View on GitHub
Multilingual and code-switching ASR challenges for low resource Indian languages.
☆23Jul 26, 2021Updated 5 years ago
Alexander-H-Liu / dinosr
View on GitHub
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
☆53Jan 18, 2024Updated 2 years ago
kyegomez / AudioFlamingo
View on GitHub
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…
☆39Jan 27, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
arenjansen / ZRTools
View on GitHub
Zero-Resource Speech Discovery, Search, and Evaluation Tools
☆29Aug 6, 2015Updated 10 years ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆145Sep 2, 2025Updated 10 months ago
nickjw0205 / Improving-ASR-with-LLM-Description
View on GitHub
☆20Sep 2, 2024Updated last year
kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
ga642381 / SpeechPrompt
View on GitHub
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…
☆102Apr 10, 2025Updated last year
zelaki / DreamSound
View on GitHub
[ICASSP'24] Investigating Personalization Methods in Text to Music Generation
☆47Mar 27, 2024Updated 2 years ago
hltcoe / gazetteer-collection
View on GitHub
☆12Mar 31, 2020Updated 6 years ago
MTG / carnatic-separation-ismir23
View on GitHub
Carnatic singing voice separation trained with in-domain data with leakage
☆11Nov 5, 2023Updated 2 years ago
AlyssaYoung / AVQA
View on GitHub
ACM MM 2022 paper_AVQA: A Dataset for Audio-Visual Question Answering on Videos
☆15Aug 17, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zszheng147 / VoiceCraft-X
View on GitHub
☆42Nov 18, 2025Updated 8 months ago
01Zhangbw / Speech-and-audio-papers-Top-Conference
View on GitHub
☆141Jan 24, 2026Updated 6 months ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
GLJS / AudioToolAgent
View on GitHub
GitHub repository for AudioToolAgent
☆20Feb 13, 2026Updated 5 months ago
qiuk2 / AAR
View on GitHub
[Official Implementation] Acoustic Autoregressive Modeling 🔥
☆74Aug 24, 2024Updated last year
Ruiqi-Yan / Awesome-Audio-Editing
View on GitHub
A curated list of models, benchmarks, tools and guides for audio editing
☆35Jul 7, 2026Updated 3 weeks ago
tencentmusic / TME-Audio-Super-Resolution-Samples
View on GitHub
Audio samples for the paper 'Phase-aware music super-resolution using generative adversarial networks'
☆14May 15, 2020Updated 6 years ago