an optimized, production-ready implementation of active speaker detection
☆81May 29, 2024Updated last year
Alternatives and similar repositories for fast-asd
Users that are interested in fast-asd are comparing it to the libraries listed below
Sorting:
- Incredibly descriptive audiovisual summaries for videos☆41Aug 2, 2024Updated last year
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆170Mar 23, 2025Updated 11 months ago
- ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'☆459Oct 23, 2023Updated 2 years ago
- The purpose of this repository is to discuss on Audio transformers☆14Mar 12, 2026Updated last week
- Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset☆72Jan 18, 2022Updated 4 years ago
- This Repository demostrates various examples using YOLO☆13Feb 9, 2024Updated 2 years ago
- MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models☆15May 14, 2024Updated last year
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆21Mar 16, 2024Updated 2 years ago
- Automatically turn your handwritten journal entries into a website using GPT3 OCR python and html☆13Dec 15, 2021Updated 4 years ago
- Streamlit-Based License Plate Recognition (LPR) App☆12Mar 26, 2025Updated 11 months ago
- ☆17Apr 22, 2024Updated last year
- Summarizing with LLMs: Using an LLM to understand GitHub issues without reading each post in detail.☆15Jul 22, 2024Updated last year
- ☆21Aug 21, 2024Updated last year
- Python app to sync Video Files to the beat of a song☆12Aug 5, 2019Updated 6 years ago
- The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Paddi…☆10Oct 12, 2023Updated 2 years ago
- Soundscape Ecology Toolkit☆11Mar 25, 2016Updated 9 years ago
- ☆15May 13, 2024Updated last year
- Example code - use word embeddings to make emoji prediction smarter with context☆11Sep 14, 2018Updated 7 years ago
- Runpod WhisperX Docker Container Repo☆15Mar 10, 2024Updated 2 years ago
- Optimized Syncnet and Chinese enhanced version, EN and CN checkpoints released☆11Nov 8, 2021Updated 4 years ago
- Image perspective transformation and text recognition☆10Jun 26, 2020Updated 5 years ago
- A Dockerized Jupyter notebook environment with pre-installed audio machine learning tools.☆12Feb 28, 2019Updated 7 years ago
- Deep Audio Segmenter, unsupervised☆10Feb 20, 2026Updated last month
- Add Rain Streak Mask On Unparied Image Using GAN☆10Sep 12, 2020Updated 5 years ago
- Simple playground chat app that interacts with OpenAI's functions with memory and custom tools.☆18Jul 11, 2023Updated 2 years ago
- [CVPR2025] KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation☆69Apr 8, 2025Updated 11 months ago
- repo for active speaker detection for media videos.☆31Nov 19, 2023Updated 2 years ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Feb 9, 2026Updated last month
- ☆12May 25, 2024Updated last year
- Official repo of the paper “AL-GTD: Deep Active Learning for Gaze Target Detection” (ACMMM2024)☆12Nov 29, 2024Updated last year
- Thermal Indoor Motion Dataset☆14Apr 27, 2023Updated 2 years ago
- Deep learning and standard machine learning methods are developed and compared in classfying audio samples from microphones deployed abo…☆11Jan 17, 2020Updated 6 years ago
- Created a fingerprint recognition system using siamese network via On-Shot Learning. It has a similar use case as that of a face-recognit…☆13Oct 19, 2020Updated 5 years ago
- Synthetic Faces High Quality - Text2Image (SFHQ-T2I) Dataset. 122,726 curated 1024x1024 synthetic face images☆17Oct 14, 2024Updated last year
- Bird Audio Detection challenge submission using an ensemble of convolutional neural networks☆14Dec 30, 2017Updated 8 years ago
- Demo repository for creating a custom chatbot powered by LLMs for Telegram and Whatsapp.☆15Jan 18, 2024Updated 2 years ago
- Code and dataset for NAACL 2022 paper "CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination" Hyounghun Kim, Abhay Zala, Mohi…☆16Nov 26, 2022Updated 3 years ago
- Pytorch Implemenation of a SRGAN with regularization loss to stabilize GAN training. Work presented at the Japanese conference MIRU.☆12Oct 17, 2018Updated 7 years ago
- ☆15Apr 26, 2025Updated 10 months ago