UITron-hub/UITron-Speech

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UITron-hub/UITron-Speech)

UITron-hub / UITron-Speech

☆21

Alternatives and similar repositories for UITron-Speech

Users that are interested in UITron-Speech are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UITron-hub / UItron
View on GitHub
☆67Sep 6, 2025Updated 10 months ago
DocTron-hub / OCRVerse
View on GitHub
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
☆30Feb 4, 2026Updated 5 months ago
LaoKuiZe / AppAgent-Pro
View on GitHub
☆16Aug 27, 2025Updated 10 months ago
AmphionTeam / TARS
View on GitHub
[ACL 2026] Closing the Modality Reasoning Gap for Speech Large Language Models
☆15Apr 17, 2026Updated 3 months ago
IMYangJinheng / DeepCFD-for-Prediction-of-flow-field-in-Laval-nozzle
View on GitHub
This is a U-Net-based deep learning model, which we call DeepCFD. You can use this model to predict the temperature, velocity, and pressu…
☆15Sep 30, 2025Updated 9 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
JiuTian-VL / SimpAgent
View on GitHub
[ICCV 2025 Highlight] Less is More: Empowering GUI Agent with Context-Aware Simplification
☆48Mar 12, 2026Updated 4 months ago
V-Droid-Agent / V-Droid
View on GitHub
Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"
☆35Feb 2, 2026Updated 5 months ago
inclusionAI / Ming-Freeform-Audio-Edit
View on GitHub
☆15Oct 27, 2025Updated 8 months ago
sdbds / florence2-ft-advanced
View on GitHub
finetune your florence2 model easy
☆21Jul 27, 2024Updated last year
facebookresearch / flowception
View on GitHub
Authors implementation of "Flowception Temporally Expansive Flow Matching for Video Generation".
☆21May 9, 2026Updated 2 months ago
iLearn-Lab / ACL25-GUI-explorer
View on GitHub
[ACL 2025] GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
☆68May 28, 2025Updated last year
OpenGVLab / ZeroGUI
View on GitHub
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
☆119Jul 17, 2025Updated last year
vspeech / Awesome-TTS-Survey
View on GitHub
A list of widely-used open-sourced autoregressive or non-autoregressive TTS models
☆20Apr 13, 2026Updated 3 months ago
Euphoria16 / UI-Genie
View on GitHub
[NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
☆60Nov 27, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ivattyue / Ada-K
View on GitHub
Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"
☆12Mar 1, 2025Updated last year
nosna / miragenews
View on GitHub
☆16May 14, 2025Updated last year
tiangeluo / RegionFocus
View on GitHub
A simple visual test-time scaling method for GUI agent grounding
☆26Dec 7, 2025Updated 7 months ago
mariegold / NP-Attack
View on GitHub
☆10Mar 22, 2022Updated 4 years ago
MagicAgent-GUI / MagicGUI
View on GitHub
☆80Sep 3, 2025Updated 10 months ago
jefferyZhan / GThinker
View on GitHub
[CVPR 2026] GThinker, Reasoning MLLM, Visual Cues, Visual Rethinking
☆18Mar 9, 2026Updated 4 months ago
VITA-Group / MAD
View on GitHub
[ICLR 2020] Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma, "I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifie…
☆20Dec 30, 2021Updated 4 years ago
google-research-datasets / screen_qa
View on GitHub
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K …
☆150Feb 7, 2025Updated last year
shihengcan / ICM-matcaffe
View on GitHub
Scene Parsing via Integrated Classification Model and Variance-Based Regularization (Matlab&Caffe), In CVPR 2019
☆12Jun 11, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
irfan112 / yowov3-multistreaming-inferencing
View on GitHub
A real-time inferencing of multistreaming YOWOv3(Spatio Temporal Action Detection task) using (UCF101-24) dataset. The repo is extension …
☆26May 15, 2026Updated 2 months ago
mapooon / Face2Diffusion
View on GitHub
[CVPR 2024] Face2Diffusion for Fast and Editable Face Personalization https://arxiv.org/abs/2403.05094
☆96Mar 28, 2024Updated 2 years ago
vivo / DiMo-GUI
View on GitHub
[EMNLP 2025]Repository for paper "DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning"
☆30Jul 2, 2025Updated last year
Ruby-He / ProTegO
View on GitHub
[MM'23] ProTegO: Protect Text Content against OCR Extraction Attack
☆14Mar 12, 2024Updated 2 years ago
TuanTNG / SimOn
View on GitHub
SimOn: A Simple Framework for Online Temporal Action Localization
☆22Nov 12, 2022Updated 3 years ago
ameya005 / Semantic_Adversarial_Attacks
View on GitHub
Code for Semantic Adversarial Attacks
☆11Oct 12, 2021Updated 4 years ago
showlab / FocusUI
View on GitHub
[CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
☆35Jun 7, 2026Updated last month
ZiangLong / LPCNet_pytorch
View on GitHub
A Pytorch version of LPCNet, including dump weight
☆36May 5, 2022Updated 4 years ago
admineral / RAG-X
View on GitHub
Advanced Video Graph RAG using SAM2,CLIP,BLIP,Qwen2-VL,YOLO-World ,Neo4j, WebGPU, local LLM
☆14Nov 25, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Exploring-Embodied-Emotion-official / E3
View on GitHub
☆25Jul 1, 2025Updated last year
TencentARC / OmniScript
View on GitHub
OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
☆18Apr 24, 2026Updated 2 months ago
wxxhub / SpeechSynthesisServer
View on GitHub
语音合成服务
☆12Mar 18, 2023Updated 3 years ago
MinglangQiao / MVVA-Database
View on GitHub
Database of "Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model", ECCV 2020
☆13May 2, 2022Updated 4 years ago
Andong-Li-speech / MDNet
View on GitHub
The implementation of MDNet, which is in submission to Interspeech2022
☆14May 1, 2022Updated 4 years ago
aburns4 / textualforesight
View on GitHub
☆12Aug 8, 2024Updated last year
yzyouzhang / Awesome-Multimedia-Deepfake-Detection
View on GitHub
Materials for "Multimedia Deepfake Detection" Tutorial @ ICME 2024
☆17Aug 26, 2024Updated last year