shaadclt / Qwen2-VL-OCR-VQALinks
This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.
☆22Updated last year
Alternatives and similar repositories for Qwen2-VL-OCR-VQA
Users that are interested in Qwen2-VL-OCR-VQA are comparing it to the libraries listed below
Sorting:
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answer☆15Updated 2 years ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated 2 years ago
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆116Updated 2 years ago
- Synthetic identity documents dataset☆28Updated 7 months ago
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆162Updated 2 months ago
- A python-based algorithm for id-card rectification☆51Updated last year
- Supporting code for: Video Enriched Retrieval Augmented Generation Using Aligned Video Captions☆31Updated last year
- Fine tune Gemma 3 on an object detection task☆87Updated 3 months ago
- This repository contains a Multimodal Retrieval-Augmented Generation (RAG) Pipeline that integrates images, audio, and text for advanced …☆23Updated 9 months ago
- Vehicle speed estimation using YOLOv8☆29Updated last year
- ☆113Updated 11 months ago
- This Repository demostrates various examples using YOLO☆13Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆84Updated last year
- Object Detection Model for Scanned Documents☆94Updated 7 months ago
- 6D Rotation Representation for Unconstrained Head Pose Estimation☆15Updated 2 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆139Updated 2 months ago
- Applying Perspective transformations to 2d images.☆38Updated 2 years ago
- Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared wi…☆51Updated last year
- This repository is created to share current progress of transformer based optical character recognition(OCR). Welcome to share~☆55Updated 2 years ago
- Code and pre-trained models for detecting spoofing attacks from images.☆39Updated 4 years ago
- ToRoLaMa: The Vietnamese Instruction-Following and Chat Model☆24Updated last year
- A component that allows you to annotate an image with points and boxes.☆21Updated last year
- Notebooks using the Neural Magic libraries 📓☆39Updated last year
- Model for document segmentation trained on the midv-500-models dataset.☆78Updated 4 years ago
- End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model☆80Updated 2 years ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆37Updated 2 years ago
- In this repository, I present a retail store item detector using YOLOv5☆117Updated 3 years ago
- Real-time detection of documents in images☆83Updated last year
- Example of YOLOv8 Segmentation on Browser. It is powered by Onnx and served through JavaScript without any frameworks☆21Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆12Updated last year