deanmalmgren/textract

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/deanmalmgren/textract)

deanmalmgren / textract

extract text from any document. no muss. no fuss.

☆4,670

Alternatives and similar repositories for textract

Users that are interested in textract are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

michaelhelmick / lassie
View on GitHub
Web Content Retrieval for Humans™
☆629Jul 30, 2022Updated 3 years ago
rspeer / python-ftfy
View on GitHub
Fixes mojibake and other glitches in Unicode text, after the fact.
☆4,051Oct 30, 2024Updated last year
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,757May 19, 2026Updated 2 months ago
schematics / schematics
View on GitHub
Python Data Structures for Humans™.
☆2,590Jul 12, 2023Updated 3 years ago
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,114Updated this week
mahmoud / boltons
View on GitHub
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library.…
☆6,906Updated this week
coleifer / micawber
View on GitHub
a small library for extracting rich content from urls
☆681Updated this week
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
miso-belica / sumy
View on GitHub
Module for automatic summarization of text documents and HTML pages.
☆3,695Updated this week
fastmonkeys / stellar
View on GitHub
Fast database snapshot and restore tool for development
☆3,853Dec 13, 2024Updated last year
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
fengsp / plan
View on GitHub
Crontab jobs management in Python
☆1,183Jul 16, 2022Updated 4 years ago
piskvorky / gensim
View on GitHub
Topic Modelling for Humans
☆16,464Nov 1, 2025Updated 8 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
seatgeek / fuzzywuzzy
View on GitHub
Fuzzy String Matching in Python
☆9,262Feb 24, 2023Updated 3 years ago
lorien / grab
View on GitHub
Web Scraping Framework
☆2,461Sep 19, 2025Updated 10 months ago
sloria / TextBlob
View on GitHub
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
☆9,541Updated this week
chrismattmann / tika-python
View on GitHub
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
☆1,661Jul 1, 2026Updated 2 weeks ago
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,239Sep 22, 2023Updated 2 years ago
clips / pattern
View on GitHub
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
☆8,856Jun 10, 2024Updated 2 years ago
yhat / db.py
View on GitHub
db.py is an easier way to interact with your databases
☆1,217Aug 2, 2021Updated 4 years ago
google / python-fire
View on GitHub
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
☆28,218Jul 1, 2026Updated 3 weeks ago
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,552Mar 22, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
epochjs / epoch
View on GitHub
A general purpose, real-time visualization library.
☆4,951Feb 14, 2019Updated 7 years ago
mailgun / flanker
View on GitHub
Python email address and Mime parsing library
☆1,651Apr 8, 2026Updated 3 months ago
cayleygraph / cayley
View on GitHub
An open-source graph database
☆15,050May 5, 2026Updated 2 months ago
tortilla / tortilla
View on GitHub
Wrapping web APIs made easy.
☆1,240Dec 29, 2020Updated 5 years ago
hugapi / hug
View on GitHub
Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.
☆6,884Jul 4, 2024Updated 2 years ago
Alir3z4 / html2text
View on GitHub
Convert HTML to Markdown-formatted text.
☆2,169Oct 28, 2025Updated 8 months ago
dinedal / textql
View on GitHub
Execute SQL against structured text like CSV or TSV
☆9,109Oct 22, 2023Updated 2 years ago
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,383Oct 27, 2025Updated 8 months ago
jeanphix / Ghost.py
View on GitHub
Webkit based scriptable web browser for python.
☆2,755Feb 24, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
wrobstory / vincent
View on GitHub
A Python to Vega translator
☆2,022Oct 25, 2016Updated 9 years ago
scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,884Apr 4, 2022Updated 4 years ago
ruipgil / scraperjs
View on GitHub
A complete and versatile web scraper.
☆3,716Oct 18, 2020Updated 5 years ago
psf / requests-html
View on GitHub
Pythonic HTML Parsing for Humans™
☆13,828Apr 16, 2024Updated 2 years ago
huginn / huginn
View on GitHub
Create agents that monitor and act on your behalf. Your agents are standing by!
☆49,656Updated this week
benhmoore / Knwl
View on GitHub
Find Dates, Places, Times, and More. A .js library for parsing text for specific information.
☆5,257Sep 28, 2023Updated 2 years ago
madisonmay / CommonRegex
View on GitHub
A collection of common regular expressions bundled with an easy to use interface.
☆1,583Apr 20, 2023Updated 3 years ago