klassif-ai / react-pdf-ner-annotator
Annotate entities directly onto a PDF with automatic OCR for scanned PDFs
☆59Updated last year
Alternatives and similar repositories for react-pdf-ner-annotator:
Users that are interested in react-pdf-ner-annotator are comparing it to the libraries listed below
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆102Updated 10 months ago
- A React component for annotating PDF, powered by PDF.js and RecogitoJS☆56Updated 10 months ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated last month
- Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)☆198Updated 2 years ago
- Software that makes labeling PDFs easy.☆405Updated 9 months ago
- Document Layout Analysis☆359Updated last month
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆47Updated 6 months ago
- Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.☆11Updated last year
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆66Updated 4 years ago
- TableNet: Deep Learning model for end-to-end Table Detection and Tabular data extraction from Scanned Data Images In modern times, more a…☆54Updated 2 years ago
- A demo that shows how to build a semantic search experience with Typesense's vector search feature and Instantsearch.js☆26Updated last year
- ☆77Updated 2 years ago
- ☆22Updated 11 months ago
- Tools for extract figure, table, text, .. from a pdf document.☆32Updated 4 years ago
- Parsing pdf tables using YOLOV3☆115Updated 3 years ago
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆26Updated last year
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆105Updated 4 years ago
- Repository for deepdoctection tutorial notebooks☆42Updated 2 months ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆21Updated last year
- A web-based document annotation tool, powered by GPT-4☆258Updated last year
- Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)☆54Updated 2 years ago
- ☆10Updated 2 years ago
- Detect textlines in document images☆91Updated 8 months ago
- Handwritten text detection in document images using Detectron2☆19Updated 3 years ago
- Annotation layer for pdf.js☆275Updated 4 months ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆205Updated last year
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆27Updated last year
- Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task…☆267Updated 2 years ago
- 🚀GUI for training spaCy models☆54Updated 3 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year