kevin91nl / website-scrape-and-deployLinks
Scrape a website and deploy to Amazon S3 to generate a serverless website.
☆13Updated 7 years ago
Alternatives and similar repositories for website-scrape-and-deploy
Users that are interested in website-scrape-and-deploy are comparing it to the libraries listed below
Sorting:
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆44Updated 3 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 9 months ago
- A Los Angeles Times analysis of helicopter accident rates☆10Updated 4 years ago
- Home of the IPTC ninjs standard☆38Updated last month
- A toolkit to make serverless swagger-based REST services simple using AWS API Gateway and Lambda☆39Updated 9 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- PhantomJS/Node.js web scraper for AWS Lambda☆95Updated 9 years ago
- Serverless Flask on AWS Lambda + API Gateway☆86Updated 2 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆167Updated 3 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Using ML to extract campaign finance data from messy forms for journalism☆77Updated 3 years ago
- [DEPRECATED] A bare bones Serverless Framework project with examples for common use cases in Python.☆35Updated 8 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- AWS Lambda functions to extract text from various binary formats.☆177Updated 7 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆190Updated 3 years ago
- Code for Newslynx App☆22Updated 9 years ago
- Deprecated Module: See Xponents or OpenSextantToolbox as active code base.☆31Updated 12 years ago
- cvStrap is a classic, clean, professional theme for the JSONResume schema with print-ready and responsive stylesheets, based on Bootstra…☆15Updated 9 years ago
- ☆21Updated 7 years ago
- View, visualize, clean and process data in the browser.☆147Updated 7 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Data pipeline for streaming, processing, and analyzing the GDELT global events dataset.☆10Updated 8 years ago
- A simple platform for managing structured data.☆27Updated 3 years ago
- ☆57Updated 12 years ago
- A simple Python library/tool for pulling location information from unstructured text☆186Updated 14 years ago
- We scan thousands of government websites to check how well they stack up on security, accessibility, and public accountability.☆27Updated 2 years ago
- Demo project showing how to create a simple web scraping service using AWS Lambda and API Gateway☆89Updated 9 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- Topic modelling with SpaCy, Gensim and Textacy☆19Updated 7 years ago