marieai / marie-aiLinks
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processi…
☆72Updated last week
Alternatives and similar repositories for marie-ai
Users that are interested in marie-ai are comparing it to the libraries listed below
Sorting:
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 7 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 5 years ago
- Logical structure analysis for visually structured documents☆92Updated 3 years ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated 2 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 5 months ago
- ☆22Updated last year
- Repository for deepdoctection tutorial notebooks☆45Updated 4 months ago
- Demo example of consumer goods categorization☆28Updated last year
- Object Detection Model for Scanned Documents☆94Updated 7 months ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-par…☆64Updated last month
- Trained BERT and Word2Vec legal clause classifiers for SPACY using the Atticus Project's Open Source Contract Label Corpus☆13Updated 4 years ago
- Full-fledged Data Exploration Tool for Label Studio☆48Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆129Updated last year
- Search PDFs using Jina, DocArray and Jina Hub☆56Updated 3 years ago
- Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant☆12Updated 2 years ago
- 🖍️ Highlight text in documents☆109Updated 6 months ago
- Evaluation framework for document processing models and services.☆51Updated this week
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆21Updated last year
- Universal text classifier for generative models☆25Updated last year
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆139Updated 2 months ago
- Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable p…☆85Updated last year
- Use AI to personify books, so that you can talk to them 🙊☆18Updated 2 years ago
- 📃 A contracts clause summarization system using LLM and vector database☆22Updated 8 months ago
- Split and analyze text files using langchain and streamlit☆50Updated last year
- ☆28Updated last year
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆27Updated 4 years ago
- Data extraction with Donut ML model☆57Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆75Updated this week