edx / pa11ycrawlerLinks

Python crawler (using Scrapy) that uses Pa11y to check accessibility of pages as it crawls.

☆18

Alternatives and similar repositories for pa11ycrawler

Users that are interested in pa11ycrawler are comparing it to the libraries listed below

Sorting:

0b01 / bodine
It finds best synonyms from Google Books when you press a hotkey
☆30Updated 10 years ago
ghrecommender / ghrecommender-backend
GHRecommender - personalized recommendations for GitHub projects based on information about repositories starred by the user
☆26Updated 2 years ago
ClimbsRocks / empythy
Automated NLP sentiment predictions- batteries included, or use your own data
☆18Updated 7 years ago
TeamHG-Memex / MaybeDont
A component that tries to avoid downloading duplicate content
☆27Updated 7 years ago
helgeho / Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
☆25Updated 7 years ago
scrapinghub / page_finder
Find which links on a web page are pagination links
☆29Updated 8 years ago
fiatjaf / washer
A whoosh-based CLI indexer and searcher for your files.
☆16Updated 8 years ago
okfn / measure
Measure is scripts and conventions to build KPI dashboards for projects.
☆17Updated 4 years ago
scrapinghub / webpager
Paginating the web
☆37Updated 11 years ago
ryansb / bookmarkd
Markdown -> IPython conversion tool
☆15Updated 10 years ago
N0taN3rd / simplechrome
Webrecorders DevTools Protocol Automation Library
☆17Updated 2 years ago
KamalaSowmya / DiscussionSummarization
Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…
☆12Updated 11 years ago
Apoc2400 / Reftag
Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more
☆22Updated 8 years ago
trec-kba / streamcorpus-pipeline
framework for making streamcorpus data
☆11Updated 8 years ago
shish / devtools-py
A Python client for Chrome's DevTools protocol / a headless chrome control library
☆15Updated 6 years ago
hatnote / weeklypedia-history
All the reports and data powering http://weekly.hatnote.com
☆13Updated this week
wordnik / serapis
Serapis is a sentence identifier and modeling pipeline / built for Wordnik
☆24Updated 9 years ago
istresearch / traptor
Traptor -- A distributed Twitter feed
☆26Updated 2 years ago
littlecolumns / little-geocoder
A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.
☆18Updated 2 years ago
chuanconggao / TopSim
Efficiently search the most similar strings against the query in Python.
☆18Updated last month
mozilla / miracle
☆13Updated 6 years ago
internetarchive / trough
Trough: Big data, small databases.
☆42Updated 11 months ago
xtannier / WebAnnotator
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Updated 3 years ago
sloria / textfeel-web
An online sentiment analyzer built with Flask and TextBlob
☆15Updated 11 years ago
osteele / ipython-secrets
A Python package that simplifies the use of secrets in a Jupyter notebook
☆21Updated 3 years ago
NAMD / pypln.backend
Pipeline for distributed Natural Language Processing, made in Python
☆65Updated 8 years ago
hernamesbarbara / table2csv
Extract data from an HTML table and store results to a csv file.
☆38Updated 9 years ago
TeamHG-Memex / scrapy-kafka-export
Scrapy extension which writes crawled items to Kafka
☆30Updated 6 years ago
opensecrets / OCRToolkit
Tools for working with Optical Character Recognition output
☆16Updated 11 years ago
rmax / scrapy-boilerplate
Small set of utilities to simplify writing Scrapy spiders.
☆49Updated 9 years ago