Siltaar / doc_crawler.pyLinks
Explore a website recursively and download all the wanted documents (PDF, ODT…)
☆20Updated 4 years ago
Alternatives and similar repositories for doc_crawler.py
Users that are interested in doc_crawler.py are comparing it to the libraries listed below
Sorting:
- A minimalistic news aggregator built with Flask and powered by News API.☆77Updated 4 months ago
- RSS feed reader for Python 3☆88Updated 3 years ago
- Tools that will make writing tests, bots and scrapers using Selenium much easier☆139Updated last year
- Find the path of a key / value in a JSON hierarchy easily.☆97Updated 9 months ago
- Sorter makes file organisation and management easier.☆37Updated 7 years ago
- darknet.py is a network application with no dependencies other than Python and Tor, useful to anonymize the traffic of linux servers and …☆69Updated 4 years ago
- Upload any image, and the app will tell you the object in the image and translate it to any language you want (read out aloud)☆42Updated 8 years ago
- A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.☆110Updated last year
- Python library for extracting text from various file formats (for indexing).☆114Updated 4 years ago
- Generative tree visualiser for Python☆16Updated 5 years ago
- Scraper for categories and lists on ecommerce and other listing websites☆43Updated 5 years ago
- Create Bootstrap 4 web pages using purely Python.☆19Updated 9 months ago
- Search, Download, and Convert YouTube videos to MP3☆103Updated 4 years ago
- A Python client for Chrome's DevTools protocol / a headless chrome control library☆15Updated 7 years ago
- Easy Html Parser is an AST generator for html/xml documents. You can easily delete/insert/extract tags in html/xml documents as well as l…☆52Updated 6 years ago
- Python package to add text to images, textures and different backgrounds☆156Updated last year
- A Package/API/Command Line application to search lyrics from different web sources☆28Updated 3 years ago
- A storage backend for tinydb that stores database changes inside a git☆17Updated 4 years ago
- A pure Python GUI app for GPG functionality and peer-to-peer encrypted messaging over Tor☆71Updated 4 years ago
- a command-line web scraping tool☆151Updated 2 years ago
- Scrapy middleware which allows to crawl only new content☆79Updated 2 weeks ago
- Chrome Debugging client for Python☆33Updated 6 years ago
- The unofficial Amazon search CLI & Python API☆112Updated 3 years ago
- Utility library to turn country names into ISO two-letter codes☆71Updated 6 months ago
- Tools to easy generate RSS feed that contains each scraped item using Scrapy framework.☆33Updated last week
- Access domain information via python and command line.☆15Updated last year
- 🕷Configuration based html scraper☆23Updated 3 months ago
- A declarative data-migration package☆16Updated last year
- ☕🗄CAching Proxy in Python – Simple file based python http proxy☆15Updated 2 months ago
- AnyAPI is a library that helps you to write any API wrapper with ease and in pythonic way.☆131Updated 4 years ago