tapanBabbar9 / computer-vision
Experiments with CV
☆29Updated 4 months ago
Alternatives and similar repositories for computer-vision
Users that are interested in computer-vision are comparing it to the libraries listed below
Sorting:
- Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.☆17Updated last year
- Speak (speech-to-text) to LLMs (Ollama) in any lanaguage - Streamlit app☆43Updated last year
- Hybrid-RAG is a hybrid Retrieval-Augmented Generation (RAG) model that leverages BERT for retrieving relevant documents and GPT-2 for gen…☆27Updated 3 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆22Updated 7 months ago
- Passively collect images for computer vision datasets on the edge.☆33Updated last year
- ☆21Updated 6 months ago
- ☆11Updated 11 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆11Updated 9 months ago
- ☆21Updated 11 months ago
- Real-Time Open-Vocabulary Object Detection☆13Updated last year
- Groq-Whisper Fast Transcription App built using Groq API and Streamlit.☆23Updated 7 months ago
- ☆14Updated 5 months ago
- ☆29Updated last year
- A service which wraps and chains video and audio Hugging Face Spaces together☆14Updated 8 months ago
- ☆29Updated 11 months ago
- On-device LLM Inference using Mediapipe LLM Inference API.☆21Updated last year
- Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.☆22Updated 5 months ago
- AI narrator☆15Updated last year
- an auto coder which automatically fixes errors and improves the code from simple user prompt☆38Updated 4 months ago
- [WIP] AI Try-On plugin for Chrome☆27Updated last year
- ☆21Updated 6 months ago
- 🧠 Mem4AI: A LLM Friendly memory management library.☆25Updated 6 months ago
- An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.☆25Updated 2 months ago
- This repository will guide you to create your Images via Stable Diffusion using a Smart Virtual Assistant like Google Assistant using Ope…☆35Updated 2 years ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…☆33Updated 4 months ago
- ☆16Updated 6 months ago
- ☆16Updated last year
- Garvis: Realtime AI Voice Assistant☆38Updated 11 months ago
- ☆20Updated last year
- Voice agent using LiveKit (orchestration), Cartesia (TTS), OpenAI (LLM), and Deepgram (STT)☆16Updated 4 months ago