scrapinghub/skinfer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapinghub/skinfer)

scrapinghub / skinfer

Skinfer is a tool for inferring and merging JSON schemas

☆141

Alternatives and similar repositories for skinfer

Users that are interested in skinfer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scrapinghub / exporters
View on GitHub
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
☆39May 21, 2024Updated 2 years ago
scrapinghub / webpager
View on GitHub
Paginating the web
☆37Feb 11, 2014Updated 12 years ago
scrapinghub / aduana
View on GitHub
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…
☆54May 21, 2024Updated 2 years ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TeamHG-Memex / Formasaurus
View on GitHub
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆121Apr 8, 2026Updated 3 months ago
scrapinghub / page_finder
View on GitHub
Find which links on a web page are pagination links
☆29Jan 12, 2017Updated 9 years ago
scrapy-plugins / scrapy-dotpersistence
View on GitHub
A scrapy extension to sync `.scrapy` folder to an S3 bucket
☆18Mar 28, 2022Updated 4 years ago
scrapinghub / scrapy-mosquitera
View on GitHub
Restrict crawl and scraping scope using matchers.
☆26Jun 8, 2016Updated 10 years ago
scrapinghub / python-hubstorage
View on GitHub
Deprecated HubStorage client library - please use python-scrapinghub>=1.9.0 instead
☆16Dec 5, 2016Updated 9 years ago
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
scrapinghub / shub
View on GitHub
Scrapinghub Command Line Client
☆129Updated this week
scrapinghub / extruct
View on GitHub
Extract embedded metadata from HTML markup
☆967Apr 1, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
scrapy / w3lib
View on GitHub
Python library of web-related functions
☆419Updated this week
TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
scrapinghub / scrapyrt
View on GitHub
HTTP API for Scrapy spiders
☆882Jun 29, 2026Updated last month
scrapinghub / js2xml
View on GitHub
Convert Javascript code to an XML document
☆188Mar 14, 2022Updated 4 years ago
rmax / scrapydo
View on GitHub
Crochet-based blocking API for Scrapy.
☆47Feb 24, 2017Updated 9 years ago
scrapinghub / python-scrapinghub
View on GitHub
A client interface for Scrapinghub's API
☆206Updated this week
redapple / parslepy
View on GitHub
Python implementation of the Parsley language for extracting structured data from web pages
☆92Oct 26, 2017Updated 8 years ago
xtannier / WebAnnotator
View on GitHub
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Dec 17, 2021Updated 4 years ago
TeamHG-Memex / scrapy-kafka-export
View on GitHub
Scrapy extension which writes crawled items to Kafka
☆31Apr 8, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
stummjr / scrapy-fieldstats
View on GitHub
A Scrapy extension to log items coverage when the spider shuts down
☆18Apr 11, 2020Updated 6 years ago
scrapy-plugins / scrapy-jsonschema
View on GitHub
Scrapy schema validation pipeline and Item builder using JSON Schema
☆45Mar 26, 2021Updated 5 years ago
scrapinghub / adblockparser
View on GitHub
Python parser for Adblock Plus filters
☆202Feb 20, 2019Updated 7 years ago
scrapy / itemloaders
View on GitHub
Library to populate items using XPath and CSS with a convenient API
☆49Updated this week
aGHz / structominer
View on GitHub
Data scraping for a more civilized age
☆17Jun 12, 2014Updated 12 years ago
discourse-lab / DiscourseSegmenter
View on GitHub
A collection of various discourse segmenters
☆10Jun 30, 2017Updated 9 years ago
nyov / scrapyext
View on GitHub
scrapy-extras -- a collection of code samples and modules for the Scrapy framework.
☆14Dec 14, 2020Updated 5 years ago
rmax / scrapy-inline-requests
View on GitHub
A decorator to write coroutine-like spider callbacks.
☆109Dec 26, 2022Updated 3 years ago
es-shims / AggregateError
View on GitHub
ES Proposal spec-compliant shim for AggregateError
☆14Dec 30, 2025Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
scrapinghub / docker-images
View on GitHub
☆33Oct 20, 2025Updated 9 months ago
TeamHG-Memex / autopager
View on GitHub
Detect and classify pagination links
☆107Apr 8, 2026Updated 3 months ago
scrapy / parsel
View on GitHub
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
☆1,347Updated this week
victormartinez / shub_cli
View on GitHub
A CLI for dealing with the features of ScrapingHub
☆16Apr 20, 2021Updated 5 years ago
rmax / databrewer
View on GitHub
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41May 29, 2017Updated 9 years ago
sendgridlabs / cookiecutter-flaskrestful
View on GitHub
Python Flask-RESTful template for cookiecutter
☆11Mar 31, 2016Updated 10 years ago
commonsearch / urlparse4
View on GitHub
Faster replacement for Python's urlparse module
☆46Apr 13, 2026Updated 3 months ago