marieai / marie-ai
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆67Updated last week
Alternatives and similar repositories for marie-ai:
Users that are interested in marie-ai are comparing it to the libraries listed below
- Tools for extract figure, table, text, .. from a pdf document.☆32Updated 4 years ago
- ☆22Updated last year
- TableNet: Deep Learning model for end-to-end Table Detection and Tabular data extraction from Scanned Data Images In modern times, more a…☆56Updated 2 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated last week
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆36Updated last year
- Repository for deepdoctection tutorial notebooks☆43Updated 4 months ago
- DFKI Layout Detection for OCR-D☆47Updated this week
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆120Updated last year
- ☆17Updated 4 years ago
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Updated last year
- ☆12Updated 11 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆104Updated 7 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis☆15Updated 3 years ago
- Detect textlines in document images☆92Updated 10 months ago
- DL models that take a document image file as input, locate the position of paragraphs, lines, images, etc. with their labels and confiden…☆26Updated 4 years ago
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆134Updated 3 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆49Updated 3 weeks ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆77Updated last year
- Example codebase for fine-tuning layoutLMv3 on DocVQA☆50Updated 2 years ago
- Google Colab Demo of CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents☆46Updated 3 years ago
- OCR & Ground Truth Resources☆75Updated 2 years ago
- Object Detection Model for Scanned Documents☆90Updated last month
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆35Updated 3 weeks ago
- Document Image Binarization☆78Updated 5 months ago
- DocILE: Document Information Localization and Extraction Benchmark☆123Updated 10 months ago
- Research papers and code on information extraction from image/pdf☆96Updated 2 years ago
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆185Updated last month
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated 2 years ago