cdimascio / essence
Automatically extract the main text content (and more) from an HTML document
☆117Updated 2 years ago
Alternatives and similar repositories for essence:
Users that are interested in essence are comparing it to the libraries listed below
- Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.☆240Updated 2 weeks ago
- A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.☆153Updated 3 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆274Updated last year
- Life and collaboration assistant.☆36Updated this week
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆121Updated last year
- SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectur…☆98Updated 4 years ago
- StaticLog - super lightweight static logging for Kotlin, Java and Android☆28Updated 7 years ago
- A web crawling framework written in Kotlin☆128Updated 3 years ago
- NameKrea is an AI Domain Name Generator which uses GPT-2☆48Updated 2 years ago
- Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Fa…☆282Updated 2 weeks ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- A natural language event parser for java and android.☆103Updated 4 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆243Updated last week
- extJWNL (Extended Java WordNet Library) is a Java API for creating, reading and updating dictionaries in WordNet format.☆128Updated last year
- NeuralParser is a very simple to use dependency parser, based on the Latent Syntactic Structure encoding.☆20Updated 4 years ago
- A Natural Language Date Time Parser that Extract date and time from text with context and parse to the required format☆231Updated 6 months ago
- A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.☆293Updated 4 months ago
- A human-friendly alternative to cron. Designed after GAE's schedule for Kotlin and/or Java 8+.☆83Updated 3 years ago
- A Directory of Online Newspaper Sources for 70+ Languages☆33Updated 3 years ago
- Multiplatform Kotlin Hello World (Android/iOS/Java/JavaScript/Native)☆76Updated 8 months ago
- A dataset of multinational first names and last names☆26Updated last year
- An implementation of Go-Links, written in Kotlin☆39Updated 3 weeks ago
- Web scraper for Goodreads quotes 📚☆22Updated 3 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆71Updated last year
- A python based HTML to text conversion library, command line client and Web service.☆301Updated 2 weeks ago
- Experiments with Elasticsearch query builder for Kotlin☆45Updated last week
- Common Crawl Index Server☆67Updated last month
- Boilerplate Removal using Deep Learning☆82Updated 3 years ago
- Article extraction benchmark: dataset and evaluation scripts☆309Updated 11 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆124Updated 3 months ago