zytedata/html-text

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zytedata/html-text)

zytedata / html-text

☆20

Alternatives and similar repositories for html-text

Users that are interested in html-text are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zytedata / zyte-spider-templates
View on GitHub
Spider templates for automatic crawlers.
☆35Mar 26, 2026Updated 3 months ago
scrapy / pypydispatcher
View on GitHub
A fork of http://pydispatcher.sourceforge.net/ with PyPy support
☆16Jul 3, 2017Updated 9 years ago
scrapoxy / scrapy-mcp-server
View on GitHub
MCP server that enables self-healing automatic repair of Scrapy spiders. When websites change, your scrapers fix themselves.
☆17Nov 9, 2025Updated 8 months ago
scrapinghub / web-poet
View on GitHub
Web scraping Page Objects core library
☆107Jul 10, 2026Updated last week
seagatesoft / webdext
View on GitHub
Intelligent Web Data Extractor
☆74Dec 5, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
stummjr / scrapy-fieldstats
View on GitHub
A Scrapy extension to log items coverage when the spider shuts down
☆18Apr 11, 2020Updated 6 years ago
croqaz / Stones
View on GitHub
🗿Stones: Persistent key-value containers, compatible with Python dict
☆17Jul 15, 2024Updated 2 years ago
rmax / scrapydo
View on GitHub
Crochet-based blocking API for Scrapy.
☆47Feb 24, 2017Updated 9 years ago
rmax / databrewer
View on GitHub
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41May 29, 2017Updated 9 years ago
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
rkrzr / dataset-popular
View on GitHub
A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.
☆15Feb 9, 2014Updated 12 years ago
nlpub / russe
View on GitHub
RUSSE: Russian Semantic Evaluation.
☆15Mar 1, 2022Updated 4 years ago
ShinyTrinkets / twofold.ts
View on GitHub
TwoFold (2✂︎f). Text files breathe fire.
☆23Jan 28, 2026Updated 5 months ago
lopuhin / kaggle-rsna-2019
View on GitHub
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/
☆19Oct 20, 2019Updated 6 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆135Apr 8, 2026Updated 3 months ago
TeamHG-Memex / tor-proxy
View on GitHub
a tor socks proxy docker image
☆12Apr 8, 2026Updated 3 months ago
dialogue-evaluation / morphoRuEval-2017
View on GitHub
☆50Nov 20, 2017Updated 8 years ago
zytedata / clear-html
View on GitHub
Remove DIVs, style stuff and normalize HTML preserving structure information
☆14Oct 24, 2025Updated 8 months ago
develer-staff / qt-pyqt-sdk-builder
View on GitHub
Create your custom Qt + PyQt SDK for multiple platforms
☆10Jun 7, 2019Updated 7 years ago
bdarnell / auto2to3
View on GitHub
Wrapper to run 2to3 automatically at import time
☆13Dec 9, 2011Updated 14 years ago
dholth / hello-pyrust
View on GitHub
A “Hello World” of calling Rust code from a Python program with CFFI, in order to show packaging issues
☆11Jul 14, 2016Updated 10 years ago
Parsely / schemato
View on GitHub
Modularly extensible semantic metadata validator
☆85Dec 10, 2015Updated 10 years ago
bitmakerla / estela
View on GitHub
estela, an elastic web scraping cluster 🕸
☆202Updated this week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
acdha / django-performance-tools
View on GitHub
EXPERIMENTAL Django performance monitoring utilities
☆15Nov 5, 2013Updated 12 years ago
sabtvg / nabu
View on GitHub
Big Scale Decision Making Tool
☆12May 11, 2020Updated 6 years ago
mkmik / metacontext
View on GitHub
playground for creating new statement and constructs in python using import hooks
☆15Apr 1, 2024Updated 2 years ago
scrapy / xtractmime
View on GitHub
https://mimesniff.spec.whatwg.org/ implementation for Python
☆13Jul 9, 2026Updated last week
SimonSapin / html5ever-python
View on GitHub
Python bindings for html5ever, using CFFI
☆39Nov 9, 2017Updated 8 years ago
julianser / dlg-segmenter
View on GitHub
Recurrent Neural Networks for Speaker and Turn Taking Classification
☆12Aug 29, 2018Updated 7 years ago
kmike / morphine
View on GitHub
[experiment] CRF-based disambiguation engine for pymorphy2
☆10May 9, 2016Updated 10 years ago
scrapy-plugins / scrapy-monkeylearn
View on GitHub
A Scrapy pipeline to categorize items using MonkeyLearn
☆38Apr 28, 2017Updated 9 years ago
stav / scrapybox
View on GitHub
Scrapy GUI
☆12Feb 26, 2021Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
corona10 / mimesniff
View on GitHub
Pure python mimesniff implementation of https://mimesniff.spec.whatwg.org
☆14Oct 24, 2020Updated 5 years ago
generalov / django-resubmit
View on GitHub
Statefull widgets for django upload
☆15Oct 3, 2016Updated 9 years ago
fcharlie / fcharlie.github.io
View on GitHub
forcemz.net
☆10Mar 7, 2026Updated 4 months ago
divio / divio-cloud-docs
View on GitHub
Divio Cloud documentation for developers
☆13Jul 20, 2022Updated 4 years ago
scrapinghub / product-extraction-benchmark
View on GitHub
☆16Apr 10, 2026Updated 3 months ago
gabrielfalcao / dead-parrot
View on GitHub
A djangoish RESTful framework in python
☆16Feb 18, 2011Updated 15 years ago
amadeus / dbg
View on GitHub
A simple and lightweight console.log replacement with some extra bonuses
☆15Feb 24, 2015Updated 11 years ago