☆20Jan 22, 2026Updated 2 months ago
Alternatives and similar repositories for UITron-Speech
Users that are interested in UITron-Speech are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆66Sep 6, 2025Updated 7 months ago
- Tracking the latest and greatest research papers on diffusion large language models.☆33Mar 13, 2026Updated 3 weeks ago
- Graph Convolutional Module for Temporal Action Localization in Videos☆10Jul 4, 2020Updated 5 years ago
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- An album application.☆15Oct 28, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆49Sep 15, 2025Updated 6 months ago
- ☆15Dec 11, 2023Updated 2 years ago
- (CVPR 26 Findings) Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-…☆34Sep 25, 2025Updated 6 months ago
- Official code for the ICLR 2025 paper, "Ada-K Routing: Boosting the Efficiency of MoE-based LLMs"☆12Mar 1, 2025Updated last year
- ☆10Apr 22, 2021Updated 4 years ago
- [ICLR 2020] Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma, "I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifie…☆20Dec 30, 2021Updated 4 years ago
- The official repo for "Unified Domain Adaptive Semantic Segmentation" (IEEE TPAMI 2025)☆33Aug 14, 2025Updated 7 months ago
- rkllm_talking is a standalone compiled voice communication system based on a large model || rkllm_talking 是一个独立编译的基于大模…☆13Oct 13, 2024Updated last year
- Scene Parsing via Integrated Classification Model and Variance-Based Regularization (Matlab&Caffe), In CVPR 2019☆11Jun 11, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications☆108Feb 22, 2026Updated last month
- SimOn: A Simple Framework for Online Temporal Action Localization☆22Nov 12, 2022Updated 3 years ago
- Codes for our ICLR2020 paper: Knowledge Consistency between Neural Networks and Beyond☆16Jan 11, 2020Updated 6 years ago
- ☆14Dec 12, 2023Updated 2 years ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆47May 16, 2025Updated 10 months ago
- LLaVA_OpenVLA part 2, Generate MLLM general training data☆11Dec 27, 2024Updated last year
- Advanced Video Graph RAG using SAM2,CLIP,BLIP,Qwen2-VL,YOLO-World ,Neo4j, WebGPU, local LLM☆14Nov 25, 2024Updated last year
- [AAAI 2022] DCAN: Improving Temporal Action Detection via Dual Context Aggregation☆17Nov 13, 2022Updated 3 years ago
- ☆33Sep 19, 2025Updated 6 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆28Feb 5, 2025Updated last year
- [ICLR 2022] Official Code Repository for "TRGP: TRUST REGION GRADIENT PROJECTION FOR CONTINUAL LEARNING"☆22Oct 5, 2022Updated 3 years ago
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆29Dec 22, 2025Updated 3 months ago
- [NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents☆55Nov 27, 2025Updated 4 months ago
- Codes available of a paper: An Efficient Cervical Whole Slide Image Analysis Framework Based on Multi-scale Semantic and Location Deep Fe…☆16Jul 26, 2022Updated 3 years ago
- 物业管理系统-前端☆39Dec 4, 2024Updated last year
- [ECCV 2022] Official Pytorch Implementation of the paper : " Semi-Supervised Temporal Action Detection with Proposal-Free Masking "☆21Jun 20, 2023Updated 2 years ago
- Database of "Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model", ECCV 2020☆13May 2, 2022Updated 3 years ago
- Where is this IP?☆14Feb 24, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Learning to Discriminate Information for Online Action Detection, CVPR 2020☆27Mar 24, 2023Updated 3 years ago
- First steps with ORTC☆10Feb 3, 2019Updated 7 years ago
- A car re-identification app based on multi-feature fusion technique☆18Apr 24, 2022Updated 3 years ago
- UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models☆110Oct 30, 2025Updated 5 months ago
- Imported from https://gitorious.org/beagleboard-usbsniffer/☆15Oct 19, 2015Updated 10 years ago
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design☆22Dec 13, 2024Updated last year
- AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.☆27Mar 21, 2024Updated 2 years ago