redapple/parslepy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/redapple/parslepy)

redapple / parslepy

Python implementation of the Parsley language for extracting structured data from web pages

☆92

Alternatives and similar repositories for parslepy

Users that are interested in parslepy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rmax / scrapydo
View on GitHub
Crochet-based blocking API for Scrapy.
☆47Feb 24, 2017Updated 9 years ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
scrapinghub / page_finder
View on GitHub
Find which links on a web page are pagination links
☆29Jan 12, 2017Updated 9 years ago
TeamHG-Memex / autopager
View on GitHub
Detect and classify pagination links
☆107Apr 8, 2026Updated 3 months ago
scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,884Apr 4, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
povilasb / scrapy-html-storage
View on GitHub
Scrapy downloader middleware that stores response HTMLs to disk.
☆18Apr 14, 2026Updated 3 months ago
scrapy / slybot
View on GitHub
☆224Apr 27, 2015Updated 11 years ago
rmax / databrewer
View on GitHub
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41May 29, 2017Updated 9 years ago
ssteuteville / scrapyz
View on GitHub
"Scrape Easy" - an extension of the Scrapy framework.
☆185Aug 13, 2016Updated 9 years ago
ArturGaspar / scrapy-qtwebkit
View on GitHub
☆13Dec 4, 2019Updated 6 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
scrapy / w3lib
View on GitHub
Python library of web-related functions
☆419Updated this week
aGHz / structominer
View on GitHub
Data scraping for a more civilized age
☆17Jun 12, 2014Updated 12 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fizx / pyparsley
View on GitHub
python binding for parsley
☆40Jan 20, 2013Updated 13 years ago
Parsely / schemato
View on GitHub
Modularly extensible semantic metadata validator
☆85Dec 10, 2015Updated 10 years ago
Fantomas42 / mots-vides
View on GitHub
Python library for managing stop words in many languages.
☆12May 11, 2015Updated 11 years ago
gutomaia / inventwithpython
View on GitHub
Book Invent With Python
☆23Apr 4, 2012Updated 14 years ago
nyov / scrapyext
View on GitHub
scrapy-extras -- a collection of code samples and modules for the Scrapy framework.
☆14Dec 14, 2020Updated 5 years ago
rochacbruno-archive / scrapy_model
View on GitHub
A helper to create web scrapers using scrapy selector in a Model based structure
☆57Dec 26, 2022Updated 3 years ago
llonchj / scrapy-sentry
View on GitHub
Sentry component for Scrapy
☆84Aug 21, 2023Updated 2 years ago
scrapinghub / scrapy-mosquitera
View on GitHub
Restrict crawl and scraping scope using matchers.
☆26Jun 8, 2016Updated 10 years ago
TeamHG-Memex / undercrawler
View on GitHub
A generic crawler
☆81Apr 8, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
scrapinghub / skinfer
View on GitHub
Skinfer is a tool for inferring and merging JSON schemas
☆141Apr 24, 2024Updated 2 years ago
iopipe / lambda-runtime-pypy3.5
View on GitHub
AWS Lambda Runtime for PyPy 3.5
☆18Dec 11, 2018Updated 7 years ago
julien-duponchelle / scrapy-graphite
View on GitHub
Output scrapy statistics to graphite/carbon
☆54Mar 9, 2013Updated 13 years ago
sv24-archive / charade
View on GitHub
NO LONGER MAINTAINED. USE chardet/chardet. Fork of chardet to support Python 2 and 3 in one code base.
☆56Jan 2, 2018Updated 8 years ago
edsu / microdata
View on GitHub
python library for extracting html microdata
☆168May 8, 2023Updated 3 years ago
timbertson / unfluff
View on GitHub
[abandoned] statistical HTML content extraction in python
☆18Jan 12, 2011Updated 15 years ago
scrapy-plugins / scrapy-dotpersistence
View on GitHub
A scrapy extension to sync `.scrapy` folder to an S3 bucket
☆18Mar 28, 2022Updated 4 years ago
scrapinghub / aduana
View on GitHub
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…
☆54May 21, 2024Updated 2 years ago
scrapy / parsel
View on GitHub
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
☆1,345Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
notconfusing / cocytus
View on GitHub
produce a stream of citiation data coming off wikimedia
☆12Mar 28, 2017Updated 9 years ago
LukeMathWalker / cargo-manifest
View on GitHub
Fork to fix some serialization issues.
☆19Apr 12, 2026Updated 3 months ago
OlivierBlanvillain / crawler
View on GitHub
Blog crawler for the blogforever project.
☆23Jan 31, 2014Updated 12 years ago
globocom / gifv-player
View on GitHub
Javascript library for playing video files with gif fallback
☆26Jun 8, 2015Updated 11 years ago
rmax / scrapy-boilerplate
View on GitHub
Small set of utilities to simplify writing Scrapy spiders.
☆50Jul 24, 2015Updated 11 years ago
scrapy-plugins / scrapy-monkeylearn
View on GitHub
A Scrapy pipeline to categorize items using MonkeyLearn
☆38Apr 28, 2017Updated 9 years ago
fizx / parsley
View on GitHub
Parsley is a simple language for extracting structured data from web pages. Parsley consists of an powerful selector language wrapped wit…
☆904Aug 30, 2015Updated 10 years ago