sieve-community / fast-asd
an optimized, production-ready implementation of active speaker detection
☆60Updated 10 months ago
Alternatives and similar repositories for fast-asd:
Users that are interested in fast-asd are comparing it to the libraries listed below
- Incredibly descriptive audiovisual summaries for videos☆40Updated 8 months ago
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆93Updated 11 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 7 months ago
- EdgeSAM model for use with Autodistill.☆26Updated 9 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆61Updated 3 weeks ago
- This package is the Python implementation of Deepgram's WebVTT and SRT formatting. Given a transcription, this package can return a valid…☆19Updated 6 months ago
- ☆14Updated last year
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆87Updated last year
- repo for active speaker detection for media videos.☆26Updated last year
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆52Updated 5 months ago
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…☆106Updated 6 months ago
- Cog wrapper for Vchitect/SEINE☆37Updated last year
- Flask-based web application designed to compare text and image embeddings using the CLIP model.☆22Updated last year
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Updated 4 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆67Updated 10 months ago
- ☆62Updated 8 months ago
- A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large …☆123Updated last year
- ☆30Updated last year
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆77Updated 5 months ago
- ☆202Updated 10 months ago
- Tools for merging pretrained large language models.☆19Updated 9 months ago
- VideoDB Python SDK☆65Updated this week
- A quality zero-shot lipsync pipeline built with MuseTalk, LivePortrait, and CodeFormer.☆36Updated 6 months ago
- Video chat apps with computer vision filters built on top of Streamlit☆50Updated last year
- 6D Rotation Representation for Unconstrained Head Pose Estimation☆13Updated last year
- GPT-4V(ision) module for use with Autodistill.☆26Updated 8 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆147Updated last month
- Transcription with speaker diarization pipeline☆92Updated last year
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆135Updated last year