bdheath / pytor
Python wrapper for scraping over the Tor network
☆26Updated 7 years ago
Alternatives and similar repositories for pytor:
Users that are interested in pytor are comparing it to the libraries listed below
- Extract social media links and account names from websites.☆38Updated 4 years ago
- A generic crawler☆78Updated 6 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)☆41Updated 7 years ago
- Paginating the web☆37Updated 11 years ago
- Streaming web crawler with WebSocket API☆44Updated last year
- API client for Aleph, supports bulk entity and document upload.☆28Updated 6 months ago
- Python clients for Zyte AutoExtract API☆40Updated 3 years ago
- extract difference between two html pages☆32Updated 6 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords☆44Updated last year
- Extract text from HTML☆135Updated 4 years ago
- Zyte Automatic Extraction integration for Scrapy☆56Updated 3 years ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the…☆36Updated 9 months ago
- Scrapy middleware for the autologin☆37Updated 6 years ago
- Scrapy integration with Tor for anonymous web scraping☆46Updated 9 years ago
- Rotating proxy crawler in Python☆82Updated 3 years ago
- Architecture of Twint scrapper which allow download tweets on many instances without api restrictions☆10Updated 4 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆150Updated 3 months ago
- A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.☆109Updated 11 months ago
- A list of memex-related tools and their repository URLs☆149Updated 7 years ago
- An easy-to-use python client for Google News feeds.☆50Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Detect and classify pagination links☆102Updated 4 years ago
- 👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.☆46Updated 2 years ago
- Python library for the TinEye API☆29Updated last year
- Pipeline for distributed Natural Language Processing, made in Python☆64Updated 8 years ago
- Scrape the Google search result with Scrapy.☆98Updated 5 years ago