shaadclt / Qwen2-VL-OCR-VQALinks
This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.
☆18Updated 8 months ago
Alternatives and similar repositories for Qwen2-VL-OCR-VQA
Users that are interested in Qwen2-VL-OCR-VQA are comparing it to the libraries listed below
Sorting:
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆81Updated this week
- This repo is a packaged version of the Yolov9 model.☆89Updated last week
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆154Updated 2 months ago
- Eye exploration☆27Updated 5 months ago
- An SDK for Transformers + YOLO and other SSD family models☆63Updated 5 months ago
- Torchreid-Pip: Packaged version of Torchreid☆13Updated 2 years ago
- autoAnnoter its a tool to auto annotate data using a exisiting models☆43Updated 11 months ago
- Vehicle speed estimation using YOLOv8☆30Updated last year
- Example of YOLOv8 Segmentation on Browser. It is powered by Onnx and served through JavaScript without any frameworks☆20Updated last year
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answer☆15Updated last year
- ☆113Updated 7 months ago
- Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"☆72Updated last month
- Ultralytics GitHub Actions☆39Updated 2 weeks ago
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆135Updated 3 weeks ago
- ☆55Updated last year
- Huggingface utilities for Ultralytics/YOLOv8☆86Updated last year
- A web utility to draw polygons and retrieve their coordinates for computer vision applications.☆78Updated 9 months ago
- Easy-to-use finetuned YOLOv8 models.☆202Updated 2 years ago
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆21Updated last year
- ☆26Updated last year
- This repository demonstrates the data preparation and fine-tuning the IDEFICS Vision Language Model.☆22Updated last year
- YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds☆132Updated last week
- Fine tune Gemma 3 on an object detection task☆72Updated this week
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆81Updated last year
- Real-time pose estimation pipeline with 🤗 Transformers☆61Updated 5 months ago
- Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics inclu…☆53Updated 6 months ago
- ☆34Updated 8 months ago
- Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts☆41Updated 10 months ago
- A high-performance library for detecting objects in images and videos, leveraging Rust's speed and safety. Optionally supports a gRPC API…☆31Updated 2 months ago
- OcSort-Pip: Packaged version of the OcSort repository☆15Updated 2 years ago