stepfun-ai/Step-Realtime-Console

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stepfun-ai/Step-Realtime-Console)

stepfun-ai / Step-Realtime-Console

Step-Realtime-Console

☆74

Alternatives and similar repositories for Step-Realtime-Console

Users that are interested in Step-Realtime-Console are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jczhang02 / MUSIC_dataset_script
View on GitHub
This repo contains script to download MUSIC dataset from youtube
☆12Jan 19, 2024Updated 2 years ago
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,486Mar 16, 2026Updated 4 months ago
MegEngine / MegCat
View on GitHub
A Deep Learning Project about cats.
☆11Aug 8, 2022Updated 3 years ago
Hannieliao / Baton
View on GitHub
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆32Mar 4, 2025Updated last year
pengzhendong / streaming-tts-webui
View on GitHub
Streaming Text to Speech Web UI
☆22May 6, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
MegEngine / examples
View on GitHub
A set of examples around MegEngine
☆31Dec 8, 2023Updated 2 years ago
stepfun-ai / StepAudio-Skills
View on GitHub
Audio skills for Claw
☆27Apr 16, 2026Updated 3 months ago
megvii-research / Occ2net
View on GitHub
☆12Sep 23, 2023Updated 2 years ago
sarulab-speech / DuplexChat
View on GitHub
☆46Jul 5, 2026Updated 2 weeks ago
MegEngine / MegDiffusion
View on GitHub
MegEngine implementation of Diffusion Models.
☆19Aug 8, 2022Updated 3 years ago
choijeongsoo / av2av
View on GitHub
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆48Sep 6, 2024Updated last year
stepfun-ai / StepFun-Prover-Preview
View on GitHub
Large language models designed for formal theorem proving through tool-integrated reasoning.
☆33Aug 13, 2025Updated 11 months ago
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
megvii-research / MEMD
View on GitHub
Megvii Electric Moped Detector (ONNX based inference)
☆13Jul 4, 2021Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
zalan159 / long-running-coding-loop
View on GitHub
Long-running autonomous coding loop: implement → test → fix, powered by AI agents (Claude Code & Codex)
☆15Apr 2, 2026Updated 3 months ago
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 5 months ago
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
XiaomiMiMo / lmms-eval
View on GitHub
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
☆70Aug 8, 2025Updated 11 months ago
pengzhendong / streaming-ChatTTS
View on GitHub
☆23Oct 30, 2024Updated last year
megvii-research / basedet
View on GitHub
An object detection codebase based on MegEngine.
☆28Dec 14, 2022Updated 3 years ago
megvii-research / basecls
View on GitHub
A codebase & model zoo for pretrained backbone based on MegEngine.
☆32Mar 6, 2023Updated 3 years ago
megvii-research / juicefs-python
View on GitHub
JuiceFS Python SDK
☆23Oct 15, 2021Updated 4 years ago
hzwer / MM2022-ViCoPerceptualHeadGeneration
View on GitHub
MM2022 Workshop-Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer
☆55May 16, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
huangruizhe / audio
View on GitHub
Data manipulation and transformation for audio signal processing, powered by PyTorch
☆10Sep 30, 2024Updated last year
megvii-research / zipfls
View on GitHub
This repo is the official megengine implementation of the ECCV2022 paper: Efficient One Pass Self-distillation with Zipf's Label Smoothin…
☆27Oct 19, 2022Updated 3 years ago
1171-jpg / MARVEL_AVR
View on GitHub
Github repo for MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
☆18Jun 12, 2024Updated 2 years ago
inclusionAI / Ming-Freeform-Audio-Edit
View on GitHub
☆15Oct 27, 2025Updated 8 months ago
NJU-PCALab / LUVE
View on GitHub
[ICML 2026] LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts
☆16May 11, 2026Updated 2 months ago
IDEA-Emdoor-Lab / UniTTS
View on GitHub
A TTS Trained on Universal Audio.
☆41Jun 6, 2025Updated last year
mutiann / speech_rankings
View on GitHub
A CSRankings-like index for speech researchers
☆35Oct 16, 2024Updated last year
ASLP-lab / Easy-Turn
View on GitHub
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
☆122Jan 25, 2026Updated 5 months ago
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yysirs / ChatSQL
View on GitHub
自然语言转SQL，直接连接数据库查询，翌金科技@yysirs
☆19May 24, 2023Updated 3 years ago
Katie0723 / Dynamic-DETR
View on GitHub
☆11Jan 12, 2023Updated 3 years ago
stepfun-ai / Step-Audio-EditX
View on GitHub
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…
☆953Apr 9, 2026Updated 3 months ago
kyutai-labs / moshi-finetune
View on GitHub
☆474Oct 3, 2025Updated 9 months ago
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
Tayjsl97 / RL-Chord
View on GitHub
This is the official implementation of RL-Chord (TNNLS).
☆13Jan 2, 2024Updated 2 years ago