nudge / schema
A Python implementation of SCHEMA - An Algorithm for Automated Product Taxonomy Mapping in E-commerce.
☆16Updated 10 years ago
Alternatives and similar repositories for schema:
Users that are interested in schema are comparing it to the libraries listed below
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 9 months ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆55Updated last year
- A distributed system for mining common crawl using SQS, AWS-EC2 and S3☆18Updated 10 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 7 years ago
- Proxy-list management application for Django☆23Updated 6 years ago
- Billy - The open source recurring billing system, powered by Balanced.☆171Updated last year
- Find which links on a web page are pagination links☆29Updated 8 years ago
- extract difference between two html pages☆32Updated 6 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Extract data from Craigslist.org by python3 and pomp framework☆37Updated 7 years ago
- Scrapy pipeline which allows you to store scrapy items in a solr server.☆19Updated 8 years ago
- A scrapy pipeline which send items to Elastic Search server☆98Updated 7 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- Tweet Lake is a commandline interface to Twitter Streaming API and big data project that extracts interesting stats out of tweet corpus.☆20Updated 2 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- Source code for RudderStack's Event Query Generator tool.☆11Updated 2 years ago
- Lightweight framework for collecting and aggregating event metrics as timeseries data☆530Updated 7 years ago
- A Foursquare data scraper that gathers all venues within a specified geographic area.☆39Updated 5 years ago
- Classify products into categories by their name with NLTK☆28Updated 10 years ago
- A scrapy extension to store requests and responses information in storage service☆26Updated 2 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Updated 5 years ago
- Detect and classify pagination links☆15Updated 4 years ago
- Scrapy middleware for the autologin☆37Updated 6 years ago
- Python video summarization. Visit the public API at -- www.shorten.tv (EDIT: The domain expired and youtube blocked it ..)☆80Updated 2 years ago
- A Gearman worker which cURLs to do work.☆51Updated 10 years ago
- A history mixin with audit logging, record locking, and time travel for FlaskSQLAlchemy☆19Updated last year
- Resize image on the fly using flask, zappa, pillow, opencv-python☆18Updated 7 years ago