scrapinghub / portia2codeLinks

☆50

Alternatives and similar repositories for portia2code

Users that are interested in portia2code are comparing it to the libraries listed below

Sorting:

redapple / parslepy
Python implementation of the Parsley language for extracting structured data from web pages
☆92Updated 8 years ago
scrapinghub / scrapylib
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
☆32Updated 7 years ago
scrapinghub / python-scrapinghub
A client interface for Scrapinghub's API
☆205Updated last month
scrapy / loginform
Fill HTML login forms automatically
☆276Updated last year
Parsely / serpextract
Easy extraction of keywords and engines from search engine results pages (SERPs).
☆92Updated last month
scrapinghub / mdr
A python library detect and extract listing data from HTML page.
☆108Updated 8 years ago
scrapinghub / aile
Automatic Item List Extraction
☆87Updated 9 years ago
scrapinghub / page_finder
Find which links on a web page are pagination links
☆29Updated 8 years ago
scrapinghub / webpager
Paginating the web
☆37Updated 11 years ago
scrapy-plugins / scrapy-magicfields
Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
☆56Updated 3 years ago
un33k / python-emailahoy
Checks if an email address is real
☆105Updated 4 years ago
adamlwgriffiths / amazon_scraper
Provides content not accessible through the standard Amazon API
☆236Updated 8 years ago
rafaelcapucho / scrapy-eagle
Scrapy Eagle is a tool that allow us to run any Scrapy based project in a distributed fashion and monitor how it is going on and how many…
☆24Updated 5 years ago
matiasb / demiurge
PyQuery-based scraping micro-framework.
☆118Updated 3 years ago
scrapinghub / shub
Scrapinghub Command Line Client
☆130Updated 3 weeks ago
julien-duponchelle / scrapy-elasticsearch
A scrapy pipeline which send items to Elastic Search server
☆98Updated 7 years ago
ponyriders / django-amazon-price-monitor
Monitors prices of Amazon products via Product Advertising API
☆156Updated 6 years ago
corywalker / selenium-crawler
Sometimes sites make crawling hard. Selenium-crawler uses selenium automation to fix that.
☆126Updated 12 years ago
brandicted / scrapy-webdriver
☆143Updated 10 years ago
ljanyst / scrapy-do
A daemon for scheduling Scrapy spiders
☆66Updated 4 years ago
TeamHG-Memex / scrapy-crawl-once
Scrapy middleware which allows to crawl only new content
☆79Updated 3 years ago
tonywangcn / scaleable-crawler-with-docker-cluster
a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine
☆97Updated last year
sebdah / scrapy-mongodb
MongoDB pipeline for Scrapy. This module supports both MongoDB in standalone setups and replica sets. scrapy-mongodb will insert the item…
☆358Updated 4 years ago
scrapy-plugins / scrapy-monkeylearn
A Scrapy pipeline to categorize items using MonkeyLearn
☆37Updated 8 years ago
rmax / scrapy-inline-requests
A decorator to write coroutine-like spider callbacks.
☆110Updated 2 years ago
scrapinghub / aduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…
☆55Updated last year
ssteuteville / scrapyz
"Scrape Easy" - an extension of the Scrapy framework.
☆186Updated 9 years ago
scrapy-plugins / scrapy-deltafetch
Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
☆276Updated 9 months ago
cnu / scrapy-random-useragent
Scrapy Middleware to set a random User-Agent for every Request.
☆202Updated 6 years ago
lethain / extraction
A Python library for extracting titles, images, descriptions and canonical urls from HTML.
☆151Updated 5 years ago