marieai / marie-aiLinks
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆70Updated this week
Alternatives and similar repositories for marie-ai
Users that are interested in marie-ai are comparing it to the libraries listed below
Sorting:
- ☆22Updated last year
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆50Updated 2 months ago
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆23Updated last year
- ☆13Updated last year
- Repository for deepdoctection tutorial notebooks☆45Updated 6 months ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆27Updated 2 years ago
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆114Updated this week
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated last month
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆112Updated 2 months ago
- Text and Layout Document Image Understanding. LayoutLM☆23Updated 3 years ago
- Object Detection Model for Scanned Documents☆93Updated 2 months ago
- Document Layout Analysis☆376Updated 2 weeks ago
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆122Updated 2 years ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆36Updated last year
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆105Updated 9 months ago
- ☆49Updated 10 months ago
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆138Updated 2 weeks ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated last year
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- ☆22Updated 2 months ago
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆191Updated 3 months ago
- ☆11Updated last year
- Demo example of consumer goods categorization☆28Updated last year
- Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.☆26Updated last year
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆52Updated 4 months ago
- Document Image Binarization☆77Updated 7 months ago
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆72Updated last month
- Data extraction with Donut ML model☆57Updated 9 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆177Updated 2 years ago