konfuzio-ai / konfuzio-sdk
Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision models tailored for your specific use cases. Find examples with code in our Tutorials section of dev.konfuzio.com and get inspiration from Use Cases section of our blog: https://konfuzio.com/en/category/marketpl…
☆60Updated this week
Related projects: ⓘ
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆41Updated last month
- Integrate AI-powered Document Analysis Pipelines☆58Updated last week
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆61Updated 6 months ago
- A framework for converting natural language text inputs to corresponding Pandas, MongoDB, Kusto and Neo4j (Cypher) queries.☆66Updated 4 months ago
- Data extraction with Donut ML model☆52Updated last month
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆72Updated 2 years ago
- A Streamlit app for showing a TimelineJS about the history of Natural Language Processing☆24Updated 10 months ago
- A streamlit component for graph visualization☆28Updated 2 years ago
- ☆20Updated 6 months ago
- ☆23Updated last year
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆63Updated 3 years ago
- Logical structure analysis for visually structured documents☆80Updated 2 years ago
- Search PDFs using Jina, DocArray and Jina Hub☆55Updated 2 years ago
- Highlight text in documents☆73Updated 11 months ago
- Streamlit Named Entity Recognition (NER) annotation custom component☆38Updated last year
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆97Updated 5 months ago
- Viewer for the structure extracted by Grobid on PDF documents☆34Updated last month
- DocLLM: A layout-aware generative language model for multimodal document understanding☆109Updated 8 months ago
- This PyTorch implementation of LayoutLM paper by Microsoft demonstrate the SequenceClassfication task using HuggingFaceTransformers to cl…☆31Updated 2 years ago
- Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF☆17Updated 3 years ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆57Updated 4 months ago
- ☆17Updated 2 years ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆11Updated last month
- A basic tool that extracts the structure from the PDF files of scientific articles.☆70Updated 2 years ago
- ☆18Updated 6 months ago
- A spaCy wrapper for GliNER☆77Updated 2 months ago
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆20Updated 11 months ago
- Extracting Semi-Structured Data from PDFs on a large scale☆50Updated 2 years ago
- OCRmyPDF EasyOCR plugin☆44Updated 2 weeks ago
- NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, …☆75Updated 6 months ago