Donghwa-KIM / audiotext-transformerView external linksLinks
cross-modal model between audio(MFCC) and text(KoBERT)
☆12Jan 14, 2021Updated 5 years ago
Alternatives and similar repositories for audiotext-transformer
Users that are interested in audiotext-transformer are comparing it to the libraries listed below
Sorting:
- This is project to analyze korquad 2.0☆23Jun 22, 2022Updated 3 years ago
- This is project for korean auto spacing☆12Aug 3, 2020Updated 5 years ago
- Multimodal Transformer for Korean Sentiment Analysis with Audio and Text Features☆28Sep 7, 2021Updated 4 years ago
- Pure python implementation of DARTS (Double ARray Trie System)☆23Dec 7, 2022Updated 3 years ago
- ☆87Dec 21, 2022Updated 3 years ago
- Implementation of the BReG-NeXt architecture☆22Mar 24, 2023Updated 2 years ago
- 업무자동화를 위한 Python 강의를 듣고 정리한 자료☆13Oct 10, 2017Updated 8 years ago
- This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.☆12Nov 7, 2024Updated last year
- This repository shows how to implement a basic model for multimodal entailment.☆10Aug 17, 2021Updated 4 years ago
- 1st place solution to the DCASE 2020 - Task 5 - Urban Sound Tagging with Spatiotemporal Context☆16Dec 8, 2022Updated 3 years ago
- Deployed a facial emotion recognition using neural network model which predicts the emotion from faces in images, videos and live feed fr…☆11May 2, 2021Updated 4 years ago
- ☆12May 3, 2022Updated 3 years ago
- Reference implementation and test synthetic data for Sorted Center Time echo density measure for acoustic impulse responses☆15Mar 18, 2020Updated 5 years ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆12Sep 6, 2024Updated last year
- A fine multimodality fusion network :)☆11Aug 9, 2021Updated 4 years ago
- Time frequency ridge detection based on relevant ridge portions☆11Aug 17, 2023Updated 2 years ago
- ☆10Nov 10, 2021Updated 4 years ago
- This repository contains the official code for "Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignm…☆12Oct 9, 2024Updated last year
- A CNN audio classifier via spectrogram images.☆10Jul 21, 2017Updated 8 years ago
- ☆13Feb 8, 2017Updated 9 years ago
- Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…☆14Feb 15, 2023Updated 3 years ago
- ZenPDF is a minimal PDF viewer for macOS.☆14Oct 13, 2024Updated last year
- Unofficial Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"☆11Nov 5, 2025Updated 3 months ago
- Speaker overlap-aware Neural Diarization☆12Feb 13, 2023Updated 3 years ago
- Official implementation of SBNet as described in "Single-branch Network for Multimodal Training".☆12Aug 28, 2023Updated 2 years ago
- Acoustic Scene Classification using transfer learning on VGGish pre-trained model☆11Jan 3, 2018Updated 8 years ago
- ☆11Nov 5, 2025Updated 3 months ago
- Python library for searching lyrics on Musixmatch, Genius and letras.mus.br.☆10Oct 10, 2024Updated last year
- ☆13Sep 26, 2023Updated 2 years ago
- Examples of how to use API of MVSep service☆28Jun 21, 2025Updated 7 months ago
- A Tensorflow implementation of Speech Emotion Recognition using Audio signals and Text Data☆12May 16, 2022Updated 3 years ago
- ☆10Nov 16, 2021Updated 4 years ago
- The material is covered in my YouTube playlist "Data Wrangling with Python" available on YUNIKARN.☆15Dec 9, 2025Updated 2 months ago
- Reproducible research code for the experiments presented in our article "Kara1k: a karaoke dataset for cover song identification and sing…☆10Jan 9, 2018Updated 8 years ago
- Flask 로 API 를 만들기 위한 튜토리얼☆10Jun 22, 2020Updated 5 years ago
- Human age estimation using deep neural networks (Keras)☆13Aug 10, 2023Updated 2 years ago
- ☆11May 18, 2022Updated 3 years ago
- [CVPR 2025] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding☆16Oct 4, 2025Updated 4 months ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago