sangaline / wayback-machine-scraper
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
☆447Updated last year
Alternatives and similar repositories for wayback-machine-scraper
Users that are interested in wayback-machine-scraper are comparing it to the libraries listed below
Sorting:
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆115Updated last year
- A Python scraper for the Facebook Ad Library, using the official Facebook Ad Library API.☆119Updated 5 years ago
- Wayback Machine API interface & a command-line tool☆526Updated last year
- Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping.☆524Updated last month
- A list of scrapers from around the web.☆671Updated 3 months ago
- Data model and processing tools for investigative entity data☆230Updated this week
- YTDT is a collection of simple tools for extracting data from the YouTube platform via the YouTube API v3.☆125Updated 7 months ago
- The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.☆301Updated this week
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆190Updated 6 years ago
- Extract text from HTML☆135Updated 4 years ago
- Automatic scraper that tracks changes in news articles over time.☆501Updated 4 years ago
- Digital Methods Initiative - Twitter Capture and Analysis Toolset☆369Updated 6 months ago
- UNOFFICIAL Python API to interface with Parler.com☆53Updated 9 months ago
- A scrapy project to extract the text and metadata of articles from news websites☆73Updated 3 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆560Updated 2 years ago
- A Scrapy middleware to bypass the CloudFlare's anti-bot protection☆110Updated 3 years ago
- An automated, programming-free web scraper for interactive sites☆110Updated last year
- Now included in rigour☆151Updated last week
- Python client for the SimilarWeb API☆45Updated 8 years ago
- Minimal set of tools to conduct stealthy scraping.☆156Updated 2 years ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆172Updated 4 months ago
- track changes to the news, where news is anything with an RSS feed☆178Updated 4 years ago
- A Python 3 library and a corresponding command line utility for accessing old tweets☆366Updated last year
- ☆26Updated 4 years ago
- Article extraction benchmark: dataset and evaluation scripts☆315Updated last year
- Python library for reading and writing warc files☆240Updated 3 years ago
- Scrapes posts and comments from public Facebook pages.☆108Updated 6 years ago
- A command line tool (and Python library) for archiving Twitter JSON☆1,380Updated last year
- IA's public Wayback Machine (moved from SourceForge)☆787Updated last year
- A Twitter search client mining tweets using their advanced search implemtation.☆90Updated 6 years ago