alexksikes / mass-scraping
Quickly download and scrape websites on a massive scale.
☆64Updated 12 years ago
Alternatives and similar repositories for mass-scraping:
Users that are interested in mass-scraping are comparing it to the libraries listed below
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆55Updated last year
- Scrapes sites. Gets news. Eventually events.☆84Updated 8 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- Summary is a complete solution to extract the title, image and description from any URL.☆18Updated last year
- This script will scrape the product data with given search keywords.☆27Updated 4 years ago
- ScraperWiki Python library for scraping and saving data☆159Updated 2 years ago
- Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords☆44Updated last year
- Google SEO scraper for "allintitle:keyword" queries.☆23Updated 10 years ago
- Scrapy middleware which allows to crawl only new content☆80Updated 2 years ago
- scraping from walmart, target and homedepot website and getting data from amazon api☆15Updated 8 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- extract difference between two html pages☆32Updated 6 years ago
- A library to interface with the Linkscape API.☆40Updated 6 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- A python tool to extract data types such as email, URL, domains and phone numbers.☆37Updated 11 years ago
- Some Python scripts I use for auditing, research and lead generation.☆29Updated 8 years ago
- Get data about companies from advanced search without the use of API☆61Updated 5 years ago
- Collection of python scripts I have created to crawl various websites, mostly for lead generation projects to match keywords and collect …☆131Updated last year
- THE LOCAL SEO DOMINATOR - CONTENT MANAGEMENT SYSTEM AND SITEMAP MODULE The Local SEO Dominator is a light-weight content management syst…☆23Updated 4 years ago
- An automatic proxy rotator - multithreaded & SSL☆81Updated 3 years ago
- 👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.☆46Updated 2 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Updated 7 years ago
- Crawler and scraper of the public directory of companies on LinkedIn.☆25Updated 5 years ago
- SEO Tool to track ranking of keywords on search engines (google app engine application)☆48Updated 12 years ago
- Social media monitoring tools such as sentiment analysis, keyword tracking and more☆47Updated 11 years ago
- Extract social media links and account names from websites.☆37Updated 4 years ago
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 8 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆46Updated 6 years ago
- Social Media Post scheduler☆22Updated 8 years ago