edx / pa11ycrawler
Python crawler (using Scrapy) that uses Pa11y to check accessibility of pages as it crawls.
☆17Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for pa11ycrawler
- extract difference between two html pages☆32Updated 6 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated last year
- NLP crowdsourcing platform for word-level annotations☆11Updated 5 years ago
- Automated NLP sentiment predictions- batteries included, or use your own data☆18Updated 6 years ago
- Analyze topics and trends in news with NLP☆48Updated last year
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆11Updated last year
- Find which links on a web page are pagination links☆29Updated 7 years ago
- Pure python script that takes user query and summarizes news related to it.☆25Updated 2 years ago
- Efficiently search the most similar strings against the query in Python.☆18Updated 6 years ago
- Scrapy project with spiders to extract article content from various german news sites☆21Updated 11 years ago
- (BROKEN, help wanted)☆15Updated 8 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 8 years ago
- Demo of the Newspaper article extraction library.☆29Updated 9 years ago
- Spell correct entire sentences using nltk freqdist and symspell☆19Updated 7 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 7 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- Scrapy pipeline which allows you to store scrapy items in a solr server.☆19Updated 8 years ago
- Slides to learn a little natural language processing (NLP) with Python. Written in reST with S5/Docutils.☆28Updated 12 years ago
- Python module for Named Entity Recognition (NER) using natural language processing.☆14Updated 3 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated 9 months ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Easy language identification of 380 languages☆18Updated 4 years ago
- A scrapy extension to store requests and responses information in storage service☆26Updated 2 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆34Updated 8 years ago
- Scraper built with Scrapy.☆14Updated 3 months ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Lightweight library that converts a HTML webpage to JSON data using a template defined in JSON.☆21Updated 4 years ago