marieai / marie-aiLinks
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆79Updated this week
Alternatives and similar repositories for marie-ai
Users that are interested in marie-ai are comparing it to the libraries listed below
Sorting:
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆53Updated 9 months ago
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 3 years ago
- ☆22Updated last year
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 5 years ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated 2 years ago
- Search PDFs using Jina, DocArray and Jina Hub☆57Updated 3 years ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- Logical structure analysis for visually structured documents☆95Updated 3 years ago
- Demo example of consumer goods categorization☆30Updated 2 years ago
- Repository for deepdoctection tutorial notebooks☆48Updated 6 months ago
- Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable p…☆87Updated last year
- Object Detection Model for Scanned Documents☆93Updated 9 months ago
- Full-fledged Data Exploration Tool for Label Studio☆48Updated last year
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆57Updated 11 months ago
- H&M Fashion Image similarity search with Weaviate and DocArray☆43Updated last year
- Evaluation framework for document processing models and services.☆59Updated last week
- Input text or image, get back matching image fashion results, using Jina, DocArray, and CLIP☆49Updated 3 years ago
- Data extraction with Donut ML model☆57Updated last year
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 7 months ago
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆41Updated 9 months ago
- Source code of the food discovery demo built on top of Qdrant☆48Updated 2 years ago
- A framework for converting natural language text inputs to corresponding Pandas, MongoDB, Kusto and Neo4j (Cypher) queries.☆92Updated last year
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆145Updated 4 months ago
- A framework for high-fidelity retrieval augmented generation in industrial knowledge bases. Integrates jargon identification, context rec…☆35Updated last month
- DocLLM: A layout-aware generative language model for multimodal document understanding☆131Updated last year
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Build document-native LLM applications☆55Updated last year
- 🖍️ Highlight text in documents☆110Updated 8 months ago
- Universal text classifier for generative models☆25Updated last year
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆73Updated last year