rorro6787 / img-desc-visually-impairedLinks
Image description System for Impaired people
☆16Updated last year
Alternatives and similar repositories for img-desc-visually-impaired
Users that are interested in img-desc-visually-impaired are comparing it to the libraries listed below
Sorting:
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆69Updated last year
- ☆26Updated last year
- Eye exploration☆31Updated 2 months ago
- Simple CogVLM client script☆14Updated 2 years ago
- This repo gives a start for the docker.☆35Updated 2 years ago
- This project breathes life into video characters by using AI to describe their personality and then chat with you as them.☆49Updated last year
- Real-time object detection using Florence-2 with a user-friendly GUI.☆30Updated 5 months ago
- 6D Rotation Representation for Unconstrained Head Pose Estimation☆17Updated 5 months ago
- EdgeSAM model for use with Autodistill.☆29Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆85Updated last year
- ☆17Updated 2 years ago
- ☆21Updated last year
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆70Updated last year
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…☆129Updated 7 months ago
- BH hackathon☆14Updated last year
- ☆17Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆12Updated last year
- Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.☆17Updated last year
- This repository holds the "Fully automated landmarking and facial segmentation on 3D photographs" files☆30Updated 2 years ago
- ☆13Updated last year
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Updated 11 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆64Updated 8 months ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆88Updated 2 years ago
- ☆25Updated 2 years ago
- Enhancement in Multimodal Representation Learning.☆41Updated last year
- ☆29Updated 2 years ago
- OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space mod…☆14Updated last week
- Vehicle speed estimation using YOLOv8☆32Updated last year
- ☆16Updated last year
- Visual RAG using less than 300 lines of code.☆29Updated last year