indix / web-auto-extractor
Automatically extracts structured information from webpages
☆108Updated 2 years ago
Alternatives and similar repositories for web-auto-extractor:
Users that are interested in web-auto-extractor are comparing it to the libraries listed below
- A `htmlparser2` handler for parsing rich metadata from HTML. Includes HTML metadata, JSON-LD, RDFa, microdata, OEmbed, Twitter cards and …☆54Updated last year
- Freeform Street Address Parser☆95Updated 2 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 3 years ago
- Helps to extract shortest optimal css-selector and multi-selector.☆26Updated 7 years ago
- Deprecated plugin to detect sentiment: use `words/polarity`☆97Updated 6 months ago
- Scrape & parse a webpage to return a JSON with found microdata (schema.org)☆43Updated 7 years ago
- Friendly web crawler for x-ray☆44Updated 2 years ago
- Cheerio based microdata parser☆58Updated 3 years ago
- A suite of modules for text analysis, including simple analysis, nGrams, and TFIDF analysis☆48Updated 4 years ago
- English NLP for Node.js and the browser.☆87Updated last year
- Tokenize paragraphs into sentences, and smaller tokens.☆48Updated last year
- Client for Stanford Named Entity Reconginiton☆27Updated 6 years ago
- JavaScript code to split names into their respective components (first, last, etc)☆111Updated 8 years ago
- Parser for robots.txt for node.js☆67Updated 4 years ago
- Node library to extract keywords from text☆58Updated 9 years ago
- MetaData html scraper and parser for Node.js (supports Promises and callback style)☆175Updated last week
- NodeJS bindings to libpostal for fast international address parsing/normalization☆231Updated 3 months ago
- Article content extraction database☆40Updated 2 years ago
- Experimental Nightmare plugin for real mouse events☆69Updated 5 years ago
- Identifies and extracts phone numbers from arbitrary text☆39Updated 7 years ago
- schema.org in JS (work in progress)☆44Updated 2 years ago
- Access properties of nested objects using dot-path notation☆127Updated 2 years ago
- Higher level client for Elasticsearch written in Node.js oriented on facets and simplicity☆20Updated 3 months ago
- ☆21Updated 7 years ago
- text mining utilities for Node.js☆141Updated 2 years ago
- PhantomJS resource pool based on generic-pool☆106Updated 5 years ago
- Simularity identification in JS☆36Updated last year
- A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and …☆27Updated 4 years ago
- This stemmming module for Node.js provides stemming capability for a variety of languages using Dr. M.F. Porter's Snowball API.☆51Updated last month
- Multilingual tokenizer that automatically tags each token with its type☆61Updated 2 years ago