shaadclt / Qwen2-VL-OCR-VQALinks
This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.
☆23Updated last year
Alternatives and similar repositories for Qwen2-VL-OCR-VQA
Users that are interested in Qwen2-VL-OCR-VQA are comparing it to the libraries listed below
Sorting:
- Synthetic identity documents dataset☆30Updated 9 months ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated 2 years ago
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆118Updated 2 years ago
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answer☆15Updated 2 years ago
- A component that allows you to annotate an image with points and boxes.☆21Updated 2 years ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆85Updated last year
- Fine tune Gemma 3 on an object detection task☆94Updated 5 months ago
- Vehicle speed estimation using YOLOv8☆31Updated last year
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆164Updated last week
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆162Updated 4 months ago
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆93Updated last week
- A Gradio component that can be used to annotate images with bounding boxes.☆64Updated 2 months ago
- A collection of hand on notebook for LLMs practitioner☆51Updated 11 months ago
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆65Updated 2 years ago
- This Repository demostrates various examples using YOLO☆13Updated last year
- Building LLMs from scratch following the book from S. Raschka☆32Updated 9 months ago
- Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared wi…☆51Updated last year
- In this repository, I present a retail store item detector using YOLOv5☆119Updated 3 years ago
- ☆114Updated last year
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆153Updated 7 months ago
- End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model☆82Updated 2 years ago
- This repo consists of the code as discussed in the Medium blog.☆17Updated 2 years ago
- YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds☆138Updated 3 weeks ago
- Notebooks for fine tuning pali gemma☆117Updated 8 months ago
- Interactive Annotation using Segment Anything for fast and accurate segmentation☆22Updated 2 years ago
- Code and pre-trained models for detecting spoofing attacks from images.☆40Updated 5 years ago
- Machine Learning Project to identify an ID Card on an image☆49Updated 4 years ago
- Object Detection Model for Scanned Documents☆93Updated 9 months ago
- This repository contains examples of using PaliGemma for tasks such as object detection, segmentation, image captioning, etc.☆22Updated 10 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆23Updated last year