N0taN3rd / simplechromeLinks
Webrecorders DevTools Protocol Automation Library
☆18Updated 3 years ago
Alternatives and similar repositories for simplechrome
Users that are interested in simplechrome are comparing it to the libraries listed below
Sorting:
- Python WSGI Middleware for adding HTTP/S proxy support to any WSGI Application☆24Updated 5 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Updated 6 years ago
- A component that tries to avoid downloading duplicate content☆27Updated last week
- extract difference between two html pages☆32Updated last week
- url canonicalization library for python and java☆36Updated 3 years ago
- A python implementation of DEPTA☆83Updated 9 years ago
- A generic crawler☆78Updated last week
- Find which links on a web page are pagination links☆29Updated 9 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆192Updated 3 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆47Updated 8 years ago
- A whoosh-based CLI indexer and searcher for your files.☆16Updated 9 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆44Updated 4 months ago
- Extract, parse and populate templates from strings☆27Updated 6 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Updated 5 months ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆35Updated 10 years ago
- Find the path of a key / value in a JSON hierarchy easily.☆97Updated 8 months ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆157Updated 4 months ago
- Copy the contents of one SQL database to another☆27Updated 3 years ago
- Utility library to turn country names into ISO two-letter codes☆71Updated 5 months ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55Updated last year
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- Scrapy schema validation pipeline and Item builder using JSON Schema☆45Updated 4 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 10 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 8 years ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated last week
- Python bindings to the Tesseract API☆66Updated 9 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated last week
- CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages☆20Updated 7 years ago
- Let you apply a Python expression to a command output like Perl or Awk would do☆94Updated 5 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago