axa-group/Parsr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/axa-group/Parsr)

axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

☆6,177

Alternatives and similar repositories for Parsr

Users that are interested in Parsr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,044Updated this week
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,984Updated this week
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,742Updated this week
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,765Aug 15, 2024Updated last year
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆646May 24, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,384Oct 27, 2025Updated 8 months ago
NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆925Sep 2, 2024Updated last year
docarray / docarray
View on GitHub
Represent, send, store and search multimodal data
☆3,123Mar 27, 2026Updated 3 months ago
ddangelov / Top2Vec
View on GitHub
Top2Vec learns jointly embedded topic, document and word vectors.
☆3,104Nov 14, 2024Updated last year
koaning / doubtlab
View on GitHub
Doubt your data, find bad labels.
☆515Jul 15, 2024Updated 2 years ago
MaartenGr / BERTopic
View on GitHub
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
☆7,750May 13, 2026Updated 2 months ago
ploomber / ploomber
View on GitHub
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
☆3,623May 29, 2025Updated last year
orchest / orchest
View on GitHub
Build data pipelines, the easy way 🛠️
☆4,135Jun 6, 2023Updated 3 years ago
Unstructured-IO / unstructured
View on GitHub
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…
☆15,182Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
typesense / typesense
View on GitHub
Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuz…
☆26,344Updated this week
deepchecks / deepchecks
View on GitHub
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML va…
☆4,039Dec 28, 2025Updated 6 months ago
MaartenGr / PolyFuzz
View on GitHub
Fuzzy string matching, grouping, and evaluation.
☆800Jul 10, 2025Updated last year
deepset-ai / FARM
View on GitHub
Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
☆1,751Dec 20, 2023Updated 2 years ago
erre-quadro / spikex
View on GitHub
SpikeX - SpaCy Pipes for Knowledge Extraction
☆403Jul 30, 2021Updated 4 years ago
koaning / bulk
View on GitHub
A Simple Bulk Labelling Tool
☆599Jul 29, 2025Updated 11 months ago
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,775May 26, 2026Updated last month
mindee / doctr
View on GitHub
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning. Ongo…
☆6,190Updated this week
PrefectHQ / prefect
View on GitHub
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
☆23,460Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
online-ml / river
View on GitHub
🌊 Online machine learning in Python
☆5,885Updated this week
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,930Updated this week
HumanSignal / label-studio
View on GitHub
Label Studio is a multi-type data labeling and annotation tool with standardized output format
☆27,904Updated this week
explosion / floret
View on GitHub
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
☆343Apr 25, 2025Updated last year
JournalismAI-2021-Quotes / quote-extraction
View on GitHub
Quote extraction for modular journalism (JournalismAI collab 2021)
☆230Feb 2, 2022Updated 4 years ago
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,763May 19, 2026Updated 2 months ago
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,580Updated this week
meilisearch / meilisearch
View on GitHub
A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.
☆58,697Updated this week
jbesomi / texthero
View on GitHub
Text preprocessing, representation and visualization from zero to hero.
☆2,910Aug 29, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
jina-ai / serve
View on GitHub
☁️ Build multimodal AI applications with cloud-native stack
☆21,862Mar 24, 2025Updated last year
cleanlab / cleanlab
View on GitHub
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data …
☆11,586Jan 13, 2026Updated 6 months ago
modin-project / modin
View on GitHub
Modin: Scale your Pandas workflows by changing a single line of code
☆10,395Feb 10, 2026Updated 5 months ago
impira / docquery
View on GitHub
An easy way to extract information from documents
☆1,774May 3, 2023Updated 3 years ago
simonw / datasette
View on GitHub
An open source multi-tool for exploring and publishing data
☆11,300Jul 14, 2026Updated last week
cortexlabs / cortex
View on GitHub
Production infrastructure for machine learning at scale
☆8,012Jun 12, 2024Updated 2 years ago
ljvmiranda921 / prodigy-pdf-custom-recipe
View on GitHub
Custom recipe and utilities for document processing
☆201Jun 19, 2022Updated 4 years ago