indix / web-auto-extractor
Automatically extracts structured information from webpages
☆108Updated 2 years ago
Alternatives and similar repositories for web-auto-extractor:
Users that are interested in web-auto-extractor are comparing it to the libraries listed below
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 3 years ago
- A `htmlparser2` handler for parsing rich metadata from HTML. Includes HTML metadata, JSON-LD, RDFa, microdata, OEmbed, Twitter cards and …☆54Updated last year
- Scrape & parse a webpage to return a JSON with found microdata (schema.org)☆43Updated 7 years ago
- US Street Address Parser☆163Updated last year
- Cheerio based microdata parser☆58Updated 3 years ago
- NodeJS bindings to libpostal for fast international address parsing/normalization☆229Updated 2 months ago
- Freeform Street Address Parser☆95Updated last year
- Helps to extract shortest optimal css-selector and multi-selector.☆26Updated 7 years ago
- A suite of modules for text analysis, including simple analysis, nGrams, and TFIDF analysis☆48Updated 4 years ago
- Deprecated plugin to detect sentiment: use `words/polarity`☆97Updated 5 months ago
- A node.js wrapper for Boilerpipe, an excellent Java library for boilerplate removal and fulltext extraction from HTML pages.☆52Updated 7 years ago
- Extracts email address from an arbitrary text input.☆62Updated 2 months ago
- plugin to extract keywords and key-phrases☆333Updated 5 months ago
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSO…☆150Updated 2 years ago
- English NLP for Node.js and the browser.☆87Updated last year
- Friendly web crawler for x-ray☆44Updated 2 years ago
- LDA topic modeling for node.js☆297Updated 7 months ago
- Find rss feeds url☆72Updated 2 years ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆344Updated 6 years ago
- A NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.☆103Updated last year
- A nodejs Scraping Utility for lazy people. MIT Licensed☆44Updated 3 years ago
- Node library to extract keywords from text☆58Updated 9 years ago
- A Javascript implementation of the Rapid Automated Keyword Extraction (RAKE) algorithm☆17Updated 5 years ago
- parse and get all utm parameters☆41Updated last year
- Node.js client for the Alexa Web Information Service☆37Updated 4 years ago
- Node wrapper around FastText Library☆57Updated 2 years ago
- This stemmming module for Node.js provides stemming capability for a variety of languages using Dr. M.F. Porter's Snowball API.☆51Updated 3 weeks ago
- PhantomJS resource pool based on generic-pool☆106Updated 5 years ago
- bag-of-words calculator in javascript☆135Updated 4 years ago
- Maxmind GeoIP2 Web Services for Node.js☆47Updated 2 weeks ago