apify / actor-page-analyzerLinks
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
☆152Updated 2 years ago
Alternatives and similar repositories for actor-page-analyzer
Users that are interested in actor-page-analyzer are comparing it to the libraries listed below
Sorting:
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 4 years ago
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆204Updated last week
- Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.☆105Updated 7 years ago
- Twitter AI Platform☆95Updated 8 years ago
- Flask code to deploy an API that pulls structured data from online news articles☆229Updated 2 years ago
- keywords-extract - Command line tool extract keywords from any web page.☆63Updated 6 years ago
- Dashboard is software for creating web apps and SaaS (support @ freenode #userdashboard)☆282Updated 4 years ago
- Track clicks and other client-side events on web pages☆225Updated 7 years ago
- Tool for real-time scraping of news articles.☆39Updated 5 years ago
- Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity☆355Updated 7 months ago
- An OPML file with 22 of the top 25 US newspapers RSS feeds☆56Updated 6 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆432Updated 2 years ago
- OpenFaaS template for headless Chrome and Puppeteer☆92Updated last year
- Remote client for distributed automated HTTP(s) content fetching.☆78Updated last week
- midas is a framework that enables you to enrich your CSV, JSON or Excel dataset with any web API you can think of.☆53Updated 7 years ago
- Wren enables users to discover and explore daily news stories 🗞️📻 📺☆257Updated 7 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Sheetsu Web Client☆178Updated 7 years ago
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆117Updated last year
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Extract a list of keywords from a website, sorted by word count.☆52Updated 9 years ago
- 📮 Dialogflow + Sendgrid = AI Mailbox☆34Updated 5 years ago
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆171Updated 5 years ago
- File-system-based database (in the git repo), with a server attached with users and access control for serving this data. See an example …☆63Updated 2 years ago
- Google2Csv a simple google scraper that saves the results on a csv/xlsx/jsonl file☆170Updated 5 years ago
- Kinase is a pluggable browser extension allowing you to label content on the web.☆78Updated 8 years ago
- A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.☆276Updated 3 years ago
- unformatted text > parse/clean it > get relevant info☆52Updated 6 years ago
- Backup of seventag when it was still open source☆42Updated 7 years ago
- A web app to create and browse text visualizations for automated customer listening.☆148Updated last year