kehanlu/Speech-IFEval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kehanlu/Speech-IFEval)

kehanlu / Speech-IFEval

Leaderboard and code for "Speech-IFEval", Interspeech 2025

☆24

Alternatives and similar repositories for Speech-IFEval

Users that are interested in Speech-IFEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Alfred0622 / HypR
View on GitHub
A benchmark corpus for ASR hypothesis revising task
☆21Sep 26, 2023Updated 2 years ago
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆141Feb 4, 2026Updated 5 months ago
kehanlu / DeSTA2
View on GitHub
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
☆127Jul 15, 2025Updated last year
kehanlu / Mandarin-Wav2Vec2
View on GitHub
Pre-trained Wav2vec2.0 for Mandarin
☆43Oct 30, 2022Updated 3 years ago
kehanlu / python
View on GitHub
臺科大程式設計社 2019 spring
☆25May 28, 2019Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
YosukeHiguchi / espnet
View on GitHub
End-to-End Speech Processing Toolkit
☆16Jan 20, 2025Updated last year
splitline / NTUSTapp
View on GitHub
一個屬於台科人的 App。
☆13Jul 30, 2018Updated 7 years ago
OptimusPrimus / tacos
View on GitHub
Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
☆16Oct 12, 2025Updated 9 months ago
yangjingyuan / ConstDecoder
View on GitHub
☆11Oct 24, 2022Updated 3 years ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆21Sep 18, 2025Updated 10 months ago
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
kehanlu / University
View on GitHub
臺科併校小幫手 🍡
☆13Apr 21, 2023Updated 3 years ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
monglechap / fluenttts
View on GitHub
FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS
☆20Nov 15, 2022Updated 3 years ago
LAION-AI / emotion-annotations
View on GitHub
☆110Jul 15, 2026Updated 2 weeks ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
NTRLab / MediaSpeech
View on GitHub
☆22Jul 22, 2022Updated 4 years ago
narihira2000 / GAS-NTUST-bulletin
View on GitHub
一個透過Google App Script發送台科公佈欄資訊的機器人
☆23Sep 22, 2022Updated 3 years ago
jaehyeongAN / KoELECTRA-finetuned-sentiment-analysis
View on GitHub
Generalized Sentiment Classifier finetuned by KoELECTRA
☆11Nov 28, 2024Updated last year
kehanlu / server-monitor
View on GitHub
A light webserver for monitoring RAM and GPU usage on multiple servers.
☆21Mar 31, 2021Updated 5 years ago
wx9Songs / MOSS-Music-Data-Pipeline
View on GitHub
☆44Apr 26, 2026Updated 3 months ago
sarulab-speech / DuplexChat
View on GitHub
☆47Jul 5, 2026Updated 3 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆198Feb 28, 2026Updated 5 months ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
FudanCVL / AVI-Bench
View on GitHub
[ICML'26] Toward Human-like Audio-Visual Intelligence of Omni-MLLMs
☆16Jun 20, 2026Updated last month
CASIA-LM / OpenS2S
View on GitHub
OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
☆119Mar 28, 2026Updated 4 months ago
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month
IDEA-Emdoor-Lab / DistilCodec
View on GitHub
A Neural Audio Codec (NAC) for Universal Audio
☆47May 30, 2025Updated last year
AGENDD / RWKV-SpeechChat
View on GitHub
RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. Various trained …
☆29Jan 1, 2025Updated last year
NieeiM / Dasheng-Audiogen
View on GitHub
Generate a complete audio clip with music, intelligible speech, and sound effects from text in one pass.
☆44May 27, 2026Updated 2 months ago
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Vyvo-Labs / CodecHub
View on GitHub
CodecHub: A Unified Library for Codec Models
☆25Dec 24, 2025Updated 7 months ago
Audio-WestlakeU / UMA-ASR
View on GitHub
This repository is the official implementation of unimodal aggregation (UMA) for automaticspeech recognition (ASR).
☆35Dec 17, 2024Updated last year
iamanigeeit / present
View on GitHub
☆14Aug 19, 2024Updated last year
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
LAION-AI / Desktop-BUD-E_V1.0
View on GitHub
BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…
☆23Oct 10, 2024Updated last year
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year