xiaomi-research/controlfoley

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xiaomi-research/controlfoley)

xiaomi-research / controlfoley

[ACM MM 2026] ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling

☆142

Alternatives and similar repositories for controlfoley

Users that are interested in controlfoley are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiaomi-research / mecat
View on GitHub
☆44May 12, 2026Updated 2 months ago
Darwin-Agent / awesome-world-models-for-digital-agents
View on GitHub
Digital Agents Meet World Models: A Survey
☆50May 8, 2026Updated 2 months ago
SGUN-father / comfyui-controlfoley
View on GitHub
神棍
☆15May 1, 2026Updated 2 months ago
SeerRay-Lab / Xiaomi-GUI-0
View on GitHub
[Technical Report] An End-to-End Multimodal GUI Agent for Real Mobile Environments
☆79Updated this week
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
xiaomi-research / q-frame
View on GitHub
[ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"
☆81Oct 25, 2025Updated 8 months ago
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
NieeiM / Dasheng-Audiogen
View on GitHub
Generate a complete audio clip with music, intelligible speech, and sound effects from text in one pass.
☆44May 27, 2026Updated last month
NJU-Speech / Foley-Omni
View on GitHub
Foley-Omni: a unified multimodal audio generation model for task-level synthesis and complete video soundtrack generation, producing spee…
☆24Jun 5, 2026Updated last month
xiaomi-research / acavcaps
View on GitHub
☆31Mar 27, 2026Updated 3 months ago
sony / mmaudiosep
View on GitHub
☆16Apr 30, 2026Updated 2 months ago
ASLP-lab / FMSU
View on GitHub
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
☆25May 21, 2026Updated 2 months ago
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆63Jun 16, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xiaomi-research / gemmax
View on GitHub
Gemma-based Multilingual Machine Translation Models
☆51Feb 13, 2026Updated 5 months ago
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated 3 weeks ago
fblissjr / krea-explorations
View on GitHub
interpretability work and exploration for krea
☆43Jul 12, 2026Updated last week
xiaomi-research / dasheng-glap
View on GitHub
Official Implementation of GLAP - General Language Audio Pretraining
☆74May 14, 2026Updated 2 months ago
lizhaoqing / UNISON
View on GitHub
☆43Jun 3, 2026Updated last month
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 2 months ago
cwx-worst-one / WavTTS
View on GitHub
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
☆209Jun 6, 2026Updated last month
buptlihang / CVLM
View on GitHub
☆23Jan 8, 2024Updated 2 years ago
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆263Feb 26, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
HY-SpongeBob / HY-SpongeBob
View on GitHub
☆26May 26, 2026Updated last month
XXH333 / WordVoice-main
View on GitHub
The inference and trainging code for WordVoice.
☆51Updated this week
trinhtuanvubk / KWS-BCResnet
View on GitHub
Keyword Spotting using BCResNet and Arcface Loss
☆13Jan 28, 2022Updated 4 years ago
pixixai / ComfyUI-CreativeLab
View on GitHub
ComfyUI批处理工具
☆16Apr 6, 2026Updated 3 months ago
smthemex / ComfyUI_JoyAI_Echo
View on GitHub
Pushing the Frontier of Long Video Generation Standalone, inference-only release for minute-level multi-shot audio-video generation with…
☆57Jun 23, 2026Updated 3 weeks ago
xiaomi-research / btl-ui
View on GitHub
[NeurIPS 2025] Implementation of the paper "BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent"
☆19Nov 27, 2025Updated 7 months ago
lmxue / NVV-SuperBench
View on GitHub
NVV-SuperBench: Beyond Words, Beyond Quality—Benchmarking Nonverbal Vocalizations in Speech Generation (Interspeech 2026 long paper)
☆18Jun 21, 2026Updated last month
stepfun-ai / StepAudio-Skills
View on GitHub
Audio skills for Claw
☆27Apr 16, 2026Updated 3 months ago
Ruiqi-Yan / Awesome-Audio-Editing
View on GitHub
A curated list of models, benchmarks, tools and guides for audio editing
☆34Jul 7, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
FunAudioLLM / FunCineForge
View on GitHub
☆442Mar 25, 2026Updated 3 months ago
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month
fblissjr / ComfyUI-AudioLoopHelper
View on GitHub
ComfyUI-AudioLoopHelper
☆16Jul 5, 2026Updated 2 weeks ago
sunnyxrxrx / X-Voice
View on GitHub
X-Voice
☆176Jun 5, 2026Updated last month
wsntxxn / UniFlow-Audio
View on GitHub
☆72Updated this week
scottishfold0621 / ACMID
View on GitHub
☆26Apr 30, 2026Updated 2 months ago
smthemex / ComfyUI_Streamv2v_Plus
View on GitHub
You can using Streamv2v in ComfyUI
☆12Sep 6, 2024Updated last year