lysanderism/TimeAudio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lysanderism/TimeAudio)

lysanderism / TimeAudio

The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module designs and a specially curated dataset.

☆30

Alternatives and similar repositories for TimeAudio

Users that are interested in TimeAudio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GLJS / AudioToolAgent
View on GitHub
GitHub repository for AudioToolAgent
☆20Feb 13, 2026Updated 5 months ago
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
wdqqdw / Echo
View on GitHub
Project page of "2026-ICLR Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning"
☆16Mar 26, 2026Updated 3 months ago
JHU-LCAP / FlexSED
View on GitHub
open-vocabulary sound event detection
☆53Dec 17, 2025Updated 7 months ago
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
CPJKU / cpjku_dcase24
View on GitHub
☆29Oct 17, 2024Updated last year
Ming-er / LGC-SED
View on GitHub
☆13Jan 3, 2024Updated 2 years ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
NieeiM / Dasheng-Audiogen
View on GitHub
Generate a complete audio clip with music, intelligible speech, and sound effects from text in one pass.
☆44May 27, 2026Updated last month
OptimusPrimus / tacos
View on GitHub
Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
☆16Oct 12, 2025Updated 9 months ago
NKU-HLT / DIFFA
View on GitHub
[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model
☆83Apr 7, 2026Updated 3 months ago
roudimit / Omni-R1
View on GitHub
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
☆47Nov 21, 2025Updated 8 months ago
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
Ruiqi-Yan / Awesome-Audio-Editing
View on GitHub
A curated list of models, benchmarks, tools and guides for audio editing
☆34Jul 7, 2026Updated 2 weeks ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
FreedomIntelligence / EchoX
View on GitHub
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
☆47Sep 19, 2025Updated 10 months ago
fschmid56 / PretrainedSED
View on GitHub
☆145May 13, 2025Updated last year
juhayna-zh / AudioControlNet
View on GitHub
Official repository for the paper "Audio ControlNet for Fine-Grained Audio Generation and Editing".
☆77Feb 7, 2026Updated 5 months ago
frankenliu / LOAE
View on GitHub
☆10Sep 25, 2024Updated last year
xiquan-li / TinyMU
View on GitHub
[ICASSP 2026] TinyMU: A Compact Audio Language Model for Music Understanding
☆36Apr 20, 2026Updated 3 months ago
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
xiquan-li / FineLAP
View on GitHub
[ACL 2026 Main] FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pre-training
☆36Apr 20, 2026Updated 3 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆142Sep 2, 2025Updated 10 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
wsntxxn / UniFlow-Audio
View on GitHub
☆74Jul 17, 2026Updated last week
LVYUERLVR / OutboundEval-Xbench
View on GitHub
OutboundEval, a comprehensive benchmark for evaluating large language models (LLMs) in expert-level intelligent outbound calling scenario…
☆17Oct 28, 2025Updated 8 months ago
zeyuxie29 / AudioTime
View on GitHub
☆39Jul 4, 2024Updated 2 years ago
wsntxxn / TextToAudioGrounding
View on GitHub
The dataset and baseline code for Text-to-Audio Grounding (TAG)
☆49Oct 23, 2025Updated 9 months ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
Honee-W / U-SAM
View on GitHub
Official repository for U-SAM (Interspeech 2025)
☆28Jun 3, 2025Updated last year
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Harper812 / FFDConv
View on GitHub
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
☆27May 13, 2026Updated 2 months ago
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
Ming-er / Audio-Free-P-Tuning
View on GitHub
☆11Dec 28, 2023Updated 2 years ago
adobe-research / openflam
View on GitHub
OpenFLAM: Framewise Language Audio Model
☆110Jun 4, 2026Updated last month
yhytoto12 / Behavior-SD
View on GitHub
Official Implementation of NAACL 2025 Paper: Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models
☆18Apr 30, 2025Updated last year
pymaster17 / VocalParse
View on GitHub
☆22Jun 18, 2026Updated last month