shaadclt / Qwen2-VL-OCR-VQALinks
This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.
β20Updated 10 months ago
Alternatives and similar repositories for Qwen2-VL-OCR-VQA
Users that are interested in Qwen2-VL-OCR-VQA are comparing it to the libraries listed below
Sorting:
- Inference and fine-tuning examples for vision models from π€ Transformersβ161Updated 3 weeks ago
- β113Updated 9 months ago
- ToRoLaMa: The Vietnamese Instruction-Following and Chat Modelβ24Updated last year
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answerβ15Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectioβ¦β84Updated last year
- Notebooks using the Neural Magic libraries πβ39Updated last year
- Object Detection Model for Scanned Documentsβ95Updated 5 months ago
- Real-time pose estimation pipeline with π€ Transformersβ63Updated 6 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysisβ132Updated last month
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imagβ¦β117Updated 2 years ago
- A high-performance library for detecting objects in images and videos, leveraging Rust's speed and safety. Optionally supports a gRPC APIβ¦β32Updated 4 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.β64Updated 10 months ago
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.β87Updated this week
- This repository contains examples of using PaliGemma for tasks such as object detection, segmentation, image captioning, etc.β22Updated 6 months ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documentsβ37Updated last year
- β34Updated 10 months ago
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includβ¦β34Updated 8 months ago
- Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating NIM microservices, Milvus, aβ¦β125Updated 11 months ago
- Fine tune Gemma 3 on an object detection taskβ79Updated last month
- Synthetic identity documents datasetβ23Updated 6 months ago
- Vehicle speed estimation using YOLOv8β30Updated last year
- YOLOv5 Segmentation Right in The Browser Using onnxruntime-webβ11Updated 2 years ago
- Interactive Annotation using Segment Anything for fast and accurate segmentationβ22Updated 2 years ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained modeβ¦β82Updated 10 months ago
- This repo gives an introduction to how to make full working example to serve your model using asynchronous Celery tasks and FastAPI. π₯ β¦β31Updated last year
- Awesome LLM application repoβ85Updated 5 months ago
- A Streamlit component integrating Label Studio Frontend in Streamlit applicationsβ78Updated last year
- Text to PowerPoint Slide Generation using GPT LLMβ34Updated last year
- Ultralytics Notebooks πβ104Updated last week
- A complete pipeline for fine-tuning YOLOv8 pose models with custom datasets. Supports automatic and semi-automatic annotation for efficieβ¦β13Updated 6 months ago