sieve-community / fast-asd
an optimized, production-ready implementation of active speaker detection
☆58Updated 7 months ago
Alternatives and similar repositories for fast-asd:
Users that are interested in fast-asd are comparing it to the libraries listed below
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆60Updated 5 months ago
- Convert your PDFs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient…☆33Updated last week
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆83Updated 8 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆53Updated last month
- EdgeSAM model for use with Autodistill.☆26Updated 7 months ago
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆64Updated last year
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆80Updated last year
- Video+code lecture on building nanoGPT from scratch☆65Updated 7 months ago
- ☆62Updated 5 months ago
- Incredibly descriptive audiovisual summaries for videos☆40Updated 5 months ago
- ☆29Updated last month
- ☆59Updated last year
- ☆195Updated 7 months ago
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆69Updated 3 months ago
- A huggingface pipeline to train a gpt model based on the transcript obtained byt the Open AI whisper model☆15Updated 2 years ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆10Updated 5 months ago
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆21Updated 10 months ago
- Notebooks using the Neural Magic libraries 📓☆41Updated 5 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆37Updated 3 months ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆47Updated last month
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆66Updated 8 months ago
- ☆154Updated last year
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆40Updated 4 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆79Updated 7 months ago
- ☆30Updated last year
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆105Updated 5 months ago
- Realtime Video and Audio Streaming with WebRTC and Gradio☆185Updated this week
- ☆38Updated 8 months ago
- ☆14Updated last year