lorien / ioweb

☆35

Related projects: ⓘ

further-reading / scrapy-gui
A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.
☆106Updated 3 months ago
scrapinghub / web-poet
Web scraping Page Objects core library
☆93Updated 2 months ago
realslimshanky / Spider-Sense
A browser extension to monitor your spiders deployed on Scrapy Cloud.
☆15Updated 3 years ago
scrapinghub / arche
Analyze scraped data
☆47Updated 4 years ago
TeamHG-Memex / MaybeDont
A component that tries to avoid downloading duplicate content
☆27Updated 6 years ago
lorien / crawler
☆53Updated this week
INNOVINATI / microwler
A micro-framework for asynchronous deep crawls and web scraping with Python
☆13Updated last year
scrapy-plugins / scrapy-headless
☆29Updated 3 years ago
chuanconggao / html2json
Lightweight library that converts a HTML webpage to JSON data using a template defined in JSON.
☆21Updated 3 years ago
TeamHG-Memex / scrapy-crawl-once
Scrapy middleware which allows to crawl only new content
☆79Updated last year
scrapinghub / scrapy-autoextract
Zyte Automatic Extraction integration for Scrapy
☆55Updated 2 years ago
scrapy / xtractmime
https://mimesniff.spec.whatwg.org/ implementation for Python
☆14Updated 8 months ago
kadnan / ScrapeGen
A simple python tool that generates a requests/bs4 based web scraper
☆26Updated 2 years ago
HyperionGray / starbelly
Streaming web crawler with WebSocket API
☆44Updated last year
scrapy / protego
A pure-Python robots.txt parser with support for modern conventions.
☆54Updated 3 months ago
scrapinghub / scrapy-poet
Page Object pattern for Scrapy
☆119Updated 2 months ago
scrapy-plugins / scrapy-magicfields
Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
☆56Updated 2 years ago
scrapy / itemloaders
Library to populate items using XPath and CSS with a convenient API
☆44Updated 3 months ago
zytedata / python-zyte-api
Python client for Zyte API
☆19Updated 3 months ago
skulltech / drymail
Makes sending emails easy and DRY — For Python 3.
☆220Updated 3 years ago
FKLC / AnyAPI
AnyAPI is a library that helps you to write any API wrapper with ease and in pythonic way.
☆132Updated 2 years ago
EdmundMartin / Scrapio
Asyncio web crawling framework. Work in progress.
☆18Updated last month
zytedata / zyte-autoextract
Python clients for Zyte AutoExtract API
☆39Updated 2 years ago
scrapy-plugins / scrapy-pagestorage
A scrapy extension to store requests and responses information in storage service
☆26Updated 2 years ago
scrapinghub / webpager
Paginating the web
☆37Updated 10 years ago
mori-b / aioconnectors
Simple secure asynchronous message queue
☆20Updated 2 months ago
croqaz / awesome-scrapy
🕶 Awesome list of Scrapy tools and libraries
☆54Updated 4 years ago
ivbeg / newsworker
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
☆76Updated last year
xtream1101 / scraperx
Library for scraping websites or apis at any scale
☆53Updated 7 months ago
wq / itertable
⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.
☆51Updated last year