Parsely / serpextractLinks

Easy extraction of keywords and engines from search engine results pages (SERPs).

☆92

Alternatives and similar repositories for serpextract

Users that are interested in serpextract are comparing it to the libraries listed below

Sorting:

Webhose / article-date-extractor
Automatically extracts and normalizes an online article or blog post publication date
☆117Updated 2 years ago
lethain / extraction
A Python library for extracting titles, images, descriptions and canonical urls from HTML.
☆151Updated 5 years ago
scrapinghub / mdr
A python library detect and extract listing data from HTML page.
☆108Updated 8 years ago
scrapinghub / webpager
Paginating the web
☆37Updated 11 years ago
jamesturk / scrapelib
⛏ a library for scraping unreliable pages
☆211Updated last month
openeventdata / scraper
Scrapes sites. Gets news. Eventually events.
☆85Updated 9 years ago
pudo-attic / scrapekit
Python library with common functionality for writing web scrapers
☆102Updated 10 years ago
redapple / parslepy
Python implementation of the Parsley language for extracting structured data from web pages
☆92Updated 7 years ago
scrapinghub / page_finder
Find which links on a web page are pagination links
☆29Updated 8 years ago
cocrawler / cocrawler
CoCrawler is a versatile web crawler built using modern tools and concurrency.
☆189Updated 3 years ago
seomoz / reppy
Modern robots.txt Parser for Python
☆195Updated last year
scrapy-plugins / scrapy-pagestorage
A scrapy extension to store requests and responses information in storage service
☆26Updated 3 years ago
kaflesudip / grabfeed
Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …
☆21Updated 4 years ago
Parsely / schemato
Modularly extensible semantic metadata validator
☆84Updated 9 years ago
scrapinghub / shub
Scrapinghub Command Line Client
☆130Updated 2 months ago
pudo / normality
A tiny library for Python text normalisation. Useful for ad-hoc text processing.
☆155Updated last month
tryolabs / daywatch
E-commerce scraping and analytics platform.
☆53Updated 9 years ago
scrapinghub / aile
Automatic Item List Extraction
☆87Updated 9 years ago
TeamHG-Memex / Formasaurus
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆119Updated last year
nicodjimenez / bus_catchers
Python scripts for scraping bus ticket data from the websites of BoltBus, Greyhound, Megabus, GoBus, Amtrak, Peterpan, and EasternTravel.
☆38Updated 5 years ago
tasdikrahman / spammy
Spam filtering made easy for you
☆144Updated 6 years ago
cantabular / scraperwiki-python
ScraperWiki Python library for scraping and saving data
☆158Updated 2 years ago
TeamHG-Memex / autopager
Detect and classify pagination links
☆103Updated last week
matiasb / demiurge
PyQuery-based scraping micro-framework.
☆118Updated 3 years ago
TeamHG-Memex / MaybeDont
A component that tries to avoid downloading duplicate content
☆27Updated 7 years ago
debrouwere / social-shares
A command-line and programmatic interface to various social sharecount endpoints.
☆30Updated 6 years ago
TeamHG-Memex / autologin-middleware
Scrapy middleware for the autologin
☆36Updated 7 years ago
TeamHG-Memex / sitehound-frontend
Site Hound (previously THH) is a Domain Discovery Tool
☆23Updated 4 years ago
TeamHG-Memex / extract-html-diff
extract difference between two html pages
☆32Updated 7 years ago
scrapinghub / webstruct
NER toolkit for HTML data
☆259Updated last year