xissy / node-boilerpipe
A node.js wrapper for Boilerpipe, an excellent Java library for boilerplate removal and fulltext extraction from HTML pages.
☆52Updated 7 years ago
Alternatives and similar repositories for node-boilerpipe:
Users that are interested in node-boilerpipe are comparing it to the libraries listed below
- A simple node.js wrapper for Stanford CoreNLP.☆75Updated 2 years ago
- A simple node.js wrapper for stanford-core-nlp.☆148Updated 7 years ago
- A simple-but-useful kNN library for NodeJS, comparing JSON Objects using Euclidean distances☆214Updated 9 years ago
- Friendly web crawler for x-ray☆44Updated last year
- A PredictionIO 0.9+ client☆60Updated 6 years ago
- Node npm for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all…☆129Updated 5 years ago
- fetch & parse ATOM & RSS feeds with Node.js☆74Updated 6 years ago
- Tokenize paragraphs into sentences, and smaller tokens.☆48Updated last year
- Node.js module to extract and summarize html content☆42Updated 10 years ago
- phantom driver for x-ray.☆111Updated 8 years ago
- Node.js module for the Bing Search API (Cognitive Services)☆56Updated 3 years ago
- Node library to extract keywords from text☆58Updated 9 years ago
- Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they a…☆41Updated 8 years ago
- Framework for setting up RESTful JSON APIs with NodeJS.☆117Updated 8 years ago
- A portable, persistent, electron-embeddable fulltext search + document store database for node.js☆252Updated 7 years ago
- Freeform Street Address Parser☆94Updated last year
- npm install stopwords☆44Updated 7 years ago
- Extract the content of any web page by using various content extractor libraries.☆10Updated 8 years ago
- A helper robot written in node javascript☆74Updated 12 years ago
- sandcrawler.js - the server-side scraping companion.☆107Updated 9 years ago
- NodeJS Named Entity Recognition, using Stanford NER (easy install)☆40Updated 7 years ago
- ImageResolver.js does its best to determine the main image on a URL without loading all images.☆162Updated 7 years ago
- WordNet Database files (previously WNdb)☆215Updated 5 years ago
- Apache Tika bridge for Node.js. Text and metadata extraction, language detection and more.☆141Updated last year
- Redis time series statistics with Node.js☆181Updated 8 years ago
- Highly scalable Node.js scraping framework for mobsters☆298Updated 2 years ago
- Front-end web interface for Bull Job Manager☆98Updated 2 years ago
- An advanced session store for NodeJS and Redis☆121Updated 11 months ago