ChrizH / pdfstructure
`pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.
☆102Updated 10 months ago
Alternatives and similar repositories for pdfstructure:
Users that are interested in pdfstructure are comparing it to the libraries listed below
- Logical structure analysis for visually structured documents☆86Updated 2 years ago
- ☆77Updated 2 years ago
- multimodal document analysis☆162Updated 8 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- Streamlit Named Entity Recognition (NER) annotation custom component☆39Updated 2 years ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆205Updated last year
- Parsing pdf tables using YOLOV3☆115Updated 3 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆102Updated 5 months ago
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆76Updated 3 years ago
- ☆57Updated 3 years ago
- Publicly released code for the LAMBERT model☆101Updated 3 years ago
- Software that makes labeling PDFs easy.☆405Updated 9 months ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆105Updated 10 months ago
- Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understan…☆345Updated 2 years ago
- ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...☆177Updated 3 years ago
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with enti…☆244Updated last year
- RaKUn 2.0 - A fast keyword detection algorithm☆65Updated this week
- Research papers and code on information extraction from image/pdf☆96Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 9 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆119Updated last year
- ☆92Updated 2 years ago
- 💫 SpaCy wrapper for ConceptNet 💫☆89Updated last year
- ☆38Updated 3 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 10 months ago
- Mining Legal Arguments in Court Decisions - Data and software☆66Updated last year
- The official tool for transforming doccano format into common dataset formats.☆106Updated last year
- Data and additional information regarding the paper: Contract Discovery. Dataset and a Few-Shot Semantic Retrieval Challenge with Competi…☆30Updated 4 years ago