marieai / marie-aiLinks
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆70Updated this week
Alternatives and similar repositories for marie-ai
Users that are interested in marie-ai are comparing it to the libraries listed below
Sorting:
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆51Updated 4 months ago
- ☆22Updated last year
- Logical structure analysis for visually structured documents☆91Updated 2 years ago
- Search PDFs using Jina, DocArray and Jina Hub☆56Updated 3 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 3 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 4 years ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated last year
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆28Updated 2 years ago
- Data extraction with Donut ML model☆57Updated 11 months ago
- 🖍️ Highlight text in documents☆109Updated 3 months ago
- Full-fledged Data Exploration Tool for Label Studio☆49Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆128Updated last year
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆21Updated last year
- Demo example of consumer goods categorization☆28Updated last year
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆24Updated last year
- Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable p…☆82Updated 10 months ago
- Object Detection Model for Scanned Documents☆94Updated 4 months ago
- Repository for deepdoctection tutorial notebooks☆46Updated last month
- Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-par…☆60Updated 3 weeks ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆22Updated 10 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆52Updated 9 months ago
- Input text or image, get back matching image fashion results, using Jina, DocArray, and CLIP☆50Updated 2 years ago
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆38Updated 4 months ago
- GLiNER model in a FastAPI microservice.☆45Updated 7 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆126Updated this week
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆197Updated 7 months ago
- Google Colab Demo of CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents☆47Updated 3 years ago
- Integrated LLM-based document and data Q&A with knowledge graph visualization☆23Updated last year
- ☆22Updated 4 months ago