apify / actor-page-analyzer
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
☆150Updated last year
Alternatives and similar repositories for actor-page-analyzer:
Users that are interested in actor-page-analyzer are comparing it to the libraries listed below
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆69Updated 3 years ago
- keywords-extract - Command line tool extract keywords from any web page.☆63Updated 6 years ago
- Automatically extracts structured information from webpages☆107Updated 2 years ago
- Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.☆101Updated 6 years ago
- Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity☆351Updated 3 weeks ago
- JavaScript Library for Google Sheets/Microsoft Excel Online through sheet2api. https://sheet2api.com/☆92Updated 2 years ago
- Twitter AI Platform☆93Updated 7 years ago
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆201Updated this week
- Scrapy rotation proxy package with advanced functions☆94Updated 2 years ago
- An OPML file with 22 of the top 25 US newspapers RSS feeds☆55Updated 6 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 7 years ago
- A browser extension that lets you find email addresses for any domain with a single click.☆71Updated 7 years ago
- Wren enables users to discover and explore daily news stories 🗞️📻 📺☆259Updated 6 years ago
- Kinase is a pluggable browser extension allowing you to label content on the web.☆78Updated 7 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆119Updated last year
- Dead simple cron service for making HTTP calls on a regular schedule.☆14Updated 4 years ago
- An interactive demo walk-through we built to give visitors a feel for what the Trevor.io platform does☆253Updated 5 years ago
- Demo of how to use self-host analytics.js☆26Updated last year
- Export your saved links on HN as JSON or CSV, with only a few keystrokes.☆62Updated last year
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆423Updated 2 years ago
- Visualize the impact of current events on stocks☆50Updated 6 years ago
- An algorithm for generating robust XPath locators for web testing.☆180Updated 2 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆188Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆55Updated last year
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Support…☆110Updated last year
- A hacky node.js ad-hoc throw-away address mail forwarder.☆38Updated 5 years ago
- Query CSVs using SQL☆167Updated 5 years ago
- A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and con…☆379Updated 2 years ago
- Track clicks and other client-side events on web pages☆225Updated 7 years ago
- Comprehensive wrapper and execution manager for the Chrome browser using the Chrome Debugging Protocol.☆220Updated 2 months ago