sieve-community / fast-asd
an optimized, production-ready implementation of active speaker detection
☆54Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for fast-asd
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆65Updated 11 months ago
- Cog wrapper for Vchitect/SEINE☆37Updated 11 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆59Updated 3 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆65Updated 6 months ago
- GPT-4V(ision) module for use with Autodistill.☆25Updated 3 months ago
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆45Updated last year
- EdgeSAM model for use with Autodistill.☆25Updated 5 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆45Updated 2 weeks ago
- ☆61Updated 3 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆23Updated last month
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆77Updated last year
- ☆48Updated last year
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆84Updated last month
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆10Updated 3 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 months ago
- Incredibly descriptive audiovisual summaries for videos☆39Updated 3 months ago
- Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦☆61Updated last year
- ☆30Updated 11 months ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆154Updated 10 months ago
- ☆23Updated last month
- ☆58Updated this week
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆11Updated last month
- Portal hopping with Stable Diffusion 👾☆22Updated 11 months ago
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago
- ☆192Updated 5 months ago
- A huggingface pipeline to train a gpt model based on the transcript obtained byt the Open AI whisper model☆15Updated last year
- Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826☆52Updated last month
- Eye exploration☆22Updated this week
- ☆13Updated 11 months ago
- ☆100Updated last year