aimclub / OCEANAILinks
Algorithms for Intelligent Assessment of Human Personality Traits based on His Multimodal Data for ranking potential candidates to perform professional responsibilities
☆43Updated 7 months ago
Alternatives and similar repositories for OCEANAI
Users that are interested in OCEANAI are comparing it to the libraries listed below
Sorting:
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆63Updated this week
- ☆203Updated 2 months ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆21Updated 4 months ago
- ☆51Updated 3 weeks ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆175Updated 2 months ago
- 😎 Awesome lists about Speech Emotion Recognition☆93Updated 7 months ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆56Updated 2 months ago
- TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog☆56Updated last year
- Use quantized versions of Whisper to speed up inference☆12Updated 9 months ago
- ☆14Updated last year
- A composition of offline tools to achieve high quality multilingual speech to text transcription☆19Updated last month
- ☆62Updated last year
- ☆15Updated 3 months ago
- Open TTS models, built for streaming on the edge☆43Updated 4 months ago
- ☆18Updated 10 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆82Updated last year
- Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model☆180Updated 11 months ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15Updated 2 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆75Updated 9 months ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆148Updated last year
- Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation☆16Updated 2 years ago
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆28Updated 2 months ago
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆56Updated 4 months ago
- Audio tokenization, in the fastest way possible!☆52Updated 11 months ago
- Add n-gram and large language model (LLM) support to Whisper models.☆30Updated 2 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆40Updated 9 months ago
- The official implementation of "A Language Modeling Approach to Diacritic-Free Hebrew TTS"☆100Updated last month
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.☆83Updated 8 months ago
- A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.☆86Updated this week