marieai / marie-aiLinks
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆80Updated 2 weeks ago
Alternatives and similar repositories for marie-ai
Users that are interested in marie-ai are comparing it to the libraries listed below
Sorting:
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆53Updated 10 months ago
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 3 years ago
- Logical structure analysis for visually structured documents☆94Updated 3 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 5 years ago
- ☆22Updated last year
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated 2 years ago
- ☆15Updated last year
- Demo example of consumer goods categorization☆30Updated 2 years ago
- Repository for deepdoctection tutorial notebooks☆48Updated 2 weeks ago
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆22Updated last month
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- Object Detection Model for Scanned Documents☆93Updated 10 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆133Updated 2 years ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆147Updated 5 months ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 8 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆23Updated last year
- Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-par…☆66Updated 3 weeks ago
- 🖍️ Highlight text in documents☆111Updated 8 months ago
- H&M Fashion Image similarity search with Weaviate and DocArray☆43Updated last year
- Split and analyze text files using langchain and streamlit☆49Updated last year
- ☆51Updated last year
- Source code of the food discovery demo built on top of Qdrant☆48Updated 2 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆115Updated last year
- ☆28Updated last year
- Data extraction with Donut ML model☆57Updated last year
- Integrated LLM-based document and data Q&A with knowledge graph visualization☆23Updated 2 years ago
- Search PDFs using Jina, DocArray and Jina Hub☆57Updated 3 years ago
- Built with Fast Dash, this app uses Embedchain, which abstracts the entire process of loading and chunking datasets, creating embeddings,…☆66Updated last year
- PDF text data extraction web app with OCR for scanned documents☆95Updated last year
- Universal text classifier for generative models☆24Updated last year