the-bird-F/GLM-Voice-RAG

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/the-bird-F/GLM-Voice-RAG)

the-bird-F / GLM-Voice-RAG

[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E2E Retrieval.

☆31

Alternatives and similar repositories for GLM-Voice-RAG

Users that are interested in GLM-Voice-RAG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆118Jun 4, 2025Updated last year
slp-rl / SpokenStoryCloze
View on GitHub
A spoken version of the textual story cloze benchmark
☆22Aug 6, 2023Updated 2 years ago
a43992899 / openl2s
View on GitHub
Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.
☆17May 9, 2025Updated last year
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
the-bird-F / Expressive-Vectors
View on GitHub
[ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
☆40Dec 24, 2025Updated 6 months ago
ASLP-lab / Easy-Turn
View on GitHub
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
☆118Jan 25, 2026Updated 5 months ago
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆140Feb 4, 2026Updated 5 months ago
xzf-thu / Audio-Reasoner
View on GitHub
The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.
☆296May 15, 2025Updated last year
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆120Jun 21, 2026Updated 2 weeks ago
zszheng147 / VoiceCraft-X
View on GitHub
☆40Nov 18, 2025Updated 7 months ago
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆140Apr 7, 2026Updated 3 months ago
bigai-nlco / UltraVoice
View on GitHub
Official Repository of UltraVoice
☆62Oct 28, 2025Updated 8 months ago
wuzhiyue111 / Codec-Evaluation
View on GitHub
☆48Apr 5, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
xiaomi-research / mecat
View on GitHub
☆42May 12, 2026Updated 2 months ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
yanghaha0908 / EmoVoice
View on GitHub
Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
☆115Oct 16, 2025Updated 8 months ago
zqs01 / data2vecnoisy
View on GitHub
☆11Oct 20, 2022Updated 3 years ago
Ruiqi-Yan / URO-Bench
View on GitHub
Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
☆55Sep 2, 2025Updated 10 months ago
kehanlu / DeSTA2
View on GitHub
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
☆127Jul 15, 2025Updated 11 months ago
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆162Mar 26, 2026Updated 3 months ago
liyunlongaaa / AD-TUNING
View on GitHub
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…
☆11Feb 23, 2024Updated 2 years ago
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
CarlWangChina / MuChin
View on GitHub
MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music
☆26Jan 7, 2026Updated 6 months ago
caizexin / GenVC
View on GitHub
Self-supervised Generative LM-based Voice Conversion
☆58Apr 24, 2025Updated last year
XiaomiMiMo / MiMo-Audio-Tokenizer
View on GitHub
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
☆144Sep 19, 2025Updated 9 months ago
ddlBoJack / Awesome-Speech-Language-Model
View on GitHub
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
☆202Jun 7, 2026Updated last month
baichuan-inc / Baichuan-Audio
View on GitHub
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
☆222Feb 28, 2025Updated last year
ydqmkkx / ShallowFlowMatching-TTS
View on GitHub
Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
☆55Sep 20, 2025Updated 9 months ago
zszheng147 / Spatial-AST
View on GitHub
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
☆87Feb 13, 2025Updated last year
Jiang-Yidi / UniCodec
View on GitHub
[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…
☆156May 30, 2025Updated last year
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆87Dec 4, 2025Updated 7 months ago
InternLM / StarBench
View on GitHub
[ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"
☆42Apr 19, 2026Updated 2 months ago
hhguo / FastGriffinLim_Pytorch
View on GitHub
☆13Nov 16, 2020Updated 5 years ago
NKU-HLT / KNN-CTC
View on GitHub
[ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
☆42Mar 20, 2024Updated 2 years ago
Ruiqi-Yan / Awesome-Full-Duplex-SDM
View on GitHub
A curated list of full-duplex spoken dialogue models & benchmarks
☆110Jun 25, 2026Updated 2 weeks ago
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆53May 1, 2025Updated last year
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆207Dec 13, 2024Updated last year