☆11Sep 1, 2024Updated last year
Alternatives and similar repositories for AVCap
Users that are interested in AVCap are comparing it to the libraries listed below
Sorting:
- ☆36Jan 20, 2025Updated last year
- 基于PC-DDSP和nsf-HiFiGAN的声码器☆18Jul 17, 2023Updated 2 years ago
- ☆13Sep 12, 2024Updated last year
- This is a PyTorch implementation of the paper "Reinforcement Learning-Based Black-Box Model Inversion Attacks" accepted by CVPR 2023.☆40May 4, 2023Updated 2 years ago
- ☆38Jan 8, 2026Updated last month
- ☆19Apr 18, 2024Updated last year
- ☆17Nov 15, 2022Updated 3 years ago
- ☆19Jun 28, 2022Updated 3 years ago
- Test-time adaptation for speech recognition model by single utterance. The official implementation of "Listen, Adapt, Better WER: Source-…☆20Apr 1, 2022Updated 3 years ago
- This is a codebase for I See-Through You: A Framework for Removing Foreground Occlusion in Both Sparse and Dense Light Field Images (WACV…☆18Apr 2, 2024Updated last year
- Test-time adaptation via Nearest neighbor information (TAST), submitted to ICLR'23☆24Jul 11, 2023Updated 2 years ago
- AI Development in Evolving Policy [AI DEP]☆46Jul 7, 2025Updated 7 months ago
- ☆28Mar 13, 2025Updated 11 months ago
- ☆31Oct 29, 2024Updated last year
- ☆33Dec 23, 2025Updated 2 months ago
- ☆28Mar 13, 2025Updated 11 months ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆33May 11, 2025Updated 9 months ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis☆52Apr 9, 2025Updated 10 months ago
- [TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation☆31Sep 6, 2024Updated last year
- Code for the Interspeech 2024 paper "MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting"☆45Jan 24, 2026Updated last month
- This is a community implementation for the paper EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularizatio…☆35Aug 4, 2023Updated 2 years ago
- AlignNet: A Unifying Approach to Audio-Visual Alignment (WACV 2020)☆34Jan 10, 2021Updated 5 years ago
- Compute distribution-based quality metrics for audio data using embeddings, with a focus on music.☆43Jan 15, 2026Updated last month
- [ECCV'24] Official code for "BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation"☆42Nov 19, 2024Updated last year
- code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection☆48May 1, 2023Updated 2 years ago
- ☆37May 28, 2025Updated 9 months ago
- ☆114May 13, 2025Updated 9 months ago
- ☆36Apr 16, 2025Updated 10 months ago
- ☆50Apr 13, 2025Updated 10 months ago
- ☆11Aug 11, 2023Updated 2 years ago
- [CVPR2025] Official code for Lost in Translation Found in Context☆23Jan 14, 2026Updated last month
- A python script COMMAND LINE utility to AUTO GENERATE SUBTITLE FILE (using free Vosk Speech Recognition API) and TRANSLATED SUBTITLE FILE…☆11May 5, 2024Updated last year
- Russian phonetical transcription☆11Nov 19, 2025Updated 3 months ago
- Improving Continuous Sign Language Recognition with Adapted Image Models☆14Nov 10, 2025Updated 3 months ago
- Anki add-on that adds Pinyin and Zhuyin readings above Chinese characters in any field.☆12Sep 23, 2025Updated 5 months ago
- eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer (IEEE RIVF23)☆10Oct 30, 2024Updated last year
- MSIT AI Fair(MAF)☆39Jan 8, 2026Updated last month