marieai / marie-ai
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆68Updated last month
Alternatives and similar repositories for marie-ai:
Users that are interested in marie-ai are comparing it to the libraries listed below
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated last year
- ☆22Updated last year
- TableNet: Deep Learning model for end-to-end Table Detection and Tabular data extraction from Scanned Data Images In modern times, more a…☆57Updated 2 years ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated last year
- Repository for deepdoctection tutorial notebooks☆44Updated 5 months ago
- Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared wi…☆45Updated 10 months ago
- Google Colab Demo of CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents☆46Updated 3 years ago
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆135Updated 3 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆103Updated last month
- Tools for extract figure, table, text, .. from a pdf document.☆32Updated 4 years ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆125Updated last year
- ☆17Updated 4 years ago
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆120Updated last year
- Object Detection Model for Scanned Documents☆91Updated 2 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆43Updated last year
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆22Updated last year
- Full-fledged Data Exploration Tool for Label Studio☆48Updated last year
- Document Image Binarization☆78Updated 6 months ago
- Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis☆15Updated 3 years ago
- ☆38Updated 4 years ago
- An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"☆78Updated last year
- Detect textlines in document images☆93Updated 11 months ago
- A line-based framework to detect and extract tabular data in JSON format from raster images using computer vision and Tesseract OCR.☆57Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆50Updated last month
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆105Updated 8 months ago
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆106Updated 2 weeks ago
- Text and Layout Document Image Understanding. LayoutLM☆23Updated 3 years ago
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆190Updated 2 months ago