scrapy/slybot

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapy/slybot)

scrapy / slybot

☆224

Alternatives and similar repositories for slybot

Users that are interested in slybot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,884Apr 4, 2022Updated 4 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
redapple / parslepy
View on GitHub
Python implementation of the Parsley language for extracting structured data from web pages
☆92Oct 26, 2017Updated 8 years ago
mvanveen / hncrawl
View on GitHub
A scrapy-based Hacker News crawler.
☆151May 21, 2013Updated 13 years ago
julien-duponchelle / scrapy-dot
View on GitHub
Export a graph of link between crawled items by scrapy in dot file format.
☆26Sep 24, 2011Updated 14 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
axiak / pybloomfiltermmap
View on GitHub
Fast Python Bloom Filter using Mmap
☆745Nov 4, 2019Updated 6 years ago
scrapinghub / mdr
View on GitHub
A python library detect and extract listing data from HTML page.
☆110May 5, 2017Updated 9 years ago
julien-duponchelle / scrapy-mongodb
View on GitHub
Mongodb support for scrapy
☆101Mar 9, 2017Updated 9 years ago
ContinuumIO / PyDataAcademy
View on GitHub
☆23Jun 25, 2026Updated 3 weeks ago
Parsely / schemato
View on GitHub
Modularly extensible semantic metadata validator
☆85Dec 10, 2015Updated 10 years ago
mozilla / spade
View on GitHub
Automated scraping markup+CSS from a list of relevant URLs, using a variety of user-agent strings. Provides reporting on usage of CSS pro…
☆22Aug 29, 2013Updated 12 years ago
rmax / scrapy-boilerplate
View on GitHub
Small set of utilities to simplify writing Scrapy spiders.
☆50Jul 24, 2015Updated 10 years ago
andrewjw / celery-crawler
View on GitHub
A Django based search engine powered by CouchDB, celery and whoosh.
☆48Dec 26, 2015Updated 10 years ago
joeyAghion / spidey-mongo
View on GitHub
Implements a MongoDB back-end for Spidey (https://github.com/joeyAghion/spidey), a framework for crawling and scraping web sites.
☆15Nov 4, 2015Updated 10 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
coldnight / qxbot
View on GitHub
使用WebQQ接口桥接XMPP和QQ的机器人,桥接后可以实现XMPP和QQ群消息互通
☆18Apr 21, 2014Updated 12 years ago
elastic / elasticsearch-transport-memcached
View on GitHub
memcached transport plugin for elasticsearch (STOPPED)
☆34Mar 15, 2023Updated 3 years ago
esnme / graphite-pymetrics
View on GitHub
☆21Nov 23, 2012Updated 13 years ago
scrapinghub / scaws
View on GitHub
Extensions for using Scrapy on Amazon AWS
☆32Dec 5, 2012Updated 13 years ago
scrapinghub / crawlera-tools
View on GitHub
Crawlera tools
☆26Feb 9, 2016Updated 10 years ago
sontek / pyvore
View on GitHub
Convore clone using gevent-socketio
☆19Jun 20, 2012Updated 14 years ago
scrapy / w3lib
View on GitHub
Python library of web-related functions
☆419Updated this week
nabucosound / django-propaganda
View on GitHub
Management of simple newsletters
☆56Aug 2, 2015Updated 10 years ago
jeanphix / Ghost.py
View on GitHub
Webkit based scriptable web browser for python.
☆2,755Feb 24, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
peterwaksman / Narwhal
View on GitHub
Narwhal is a keyword and KEY NARRATIVE manager that creates language-aware classes. Because Narhwal does not use NLP it avoids complexity…
☆12Oct 16, 2018Updated 7 years ago
garysieling / chrome-scraper
View on GitHub
Chrome Based Scraper
☆22Feb 7, 2013Updated 13 years ago
fedora-infra / datanommer
View on GitHub
Put all the messages in the postgres
☆16Updated this week
nopper / twittomatic
View on GitHub
Distributed twitter crawler in Python
☆25Nov 4, 2022Updated 3 years ago
kippt / django-api-boilerplate
View on GitHub
Legos for your Django API
☆20May 24, 2013Updated 13 years ago
mrflip / monkeyshines
View on GitHub
A simple, lightweight scraper for huffing API or feed data (rather than a page-by-page wander)
☆31Apr 10, 2010Updated 16 years ago
sunlightlabs / cluster-explorer
View on GitHub
Tool for exploring clusters of similar documents in a text corpus.
☆16Apr 15, 2014Updated 12 years ago
klbostee / dumbo
View on GitHub
Python module that allows one to easily write and run Hadoop programs.
☆1,030Jan 9, 2018Updated 8 years ago
datacratic / hintly_ensemble
View on GitHub
☆21Dec 31, 2009Updated 16 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
alangrafu / visualbox
View on GitHub
Visualization server based on LODSPeaKr
☆20Oct 8, 2013Updated 12 years ago
commoncrawl / commoncrawl-crawler
View on GitHub
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
☆226Dec 22, 2022Updated 3 years ago
mikej165 / scrapy-web-ui
View on GitHub
scrapy-ui
☆16Feb 21, 2014Updated 12 years ago
turian / kea-service
View on GitHub
KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service
☆42Jun 7, 2011Updated 15 years ago
TalkAboutLocal / local-news-engine
View on GitHub
☆14Mar 9, 2017Updated 9 years ago
auvipy / celery-flower
View on GitHub
Under heavy development now: Real time Celery monitoring with ASGI 3.0 +
☆175Oct 30, 2019Updated 6 years ago
svetlyak40wt / scrapy-useragents
View on GitHub
A middleware to use random user agent in Scrapy crawler.
☆33Dec 15, 2012Updated 13 years ago