sieve-community / fast-asd
an optimized, production-ready implementation of active speaker detection
☆47Updated 3 months ago
Related projects: ⓘ
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆44Updated 10 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆54Updated last month
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆34Updated last week
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆75Updated 11 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆65Updated 4 months ago
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆64Updated 9 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 weeks ago
- ☆13Updated 9 months ago
- A huggingface pipeline to train a gpt model based on the transcript obtained byt the Open AI whisper model☆15Updated last year
- Incredibly descriptive audiovisual summaries for videos☆39Updated last month
- EdgeSAM model for use with Autodistill.☆24Updated 3 months ago
- Efficient approach to speaker diarization using voice characteristics extraction☆56Updated 4 months ago
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…☆39Updated last week
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆26Updated this week
- ☆30Updated 9 months ago
- Video+code lecture on building nanoGPT from scratch☆64Updated 3 months ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆15Updated last week
- ☆60Updated 11 months ago
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆19Updated 6 months ago
- YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds☆119Updated 2 weeks ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆154Updated 8 months ago
- A list of language models with permissive licenses such as MIT or Apache 2.0☆21Updated 3 weeks ago
- ☆19Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆77Updated 3 months ago
- The code for some apps built with Sieve.☆67Updated last week
- GRDN.AI app for garden optimization☆68Updated 7 months ago
- ☆61Updated last month
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆97Updated last year
- A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large …☆118Updated 11 months ago
- ☆54Updated 8 months ago