cdimascio / essence
Automatically extract the main text content (and more) from an HTML document
☆115Updated 2 years ago
Alternatives and similar repositories for essence:
Users that are interested in essence are comparing it to the libraries listed below
- Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.☆239Updated last month
- A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.☆150Updated 3 years ago
- Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Fa…☆267Updated 2 weeks ago
- Article extraction benchmark: dataset and evaluation scripts☆300Updated 9 months ago
- SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectur…☆98Updated 4 years ago
- extJWNL (Extended Java WordNet Library) is a Java API for creating, reading and updating dictionaries in WordNet format.☆127Updated 10 months ago
- NeuralParser is a very simple to use dependency parser, based on the Latent Syntactic Structure encoding.☆20Updated 4 years ago
- Life and collaboration assistant.☆33Updated this week
- Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm☆66Updated 4 years ago
- WordNet in JSON format.☆91Updated 4 years ago
- Java library to extract links (URLs, email addresses) from plain text; fast, small and smart☆207Updated 2 months ago
- IBM Q Experience Kotlin toolkit - Kotlin library to interact and write assembly code for IBM Quantum computers☆16Updated 6 years ago
- A web crawling framework written in Kotlin☆127Updated 3 years ago
- Repair & enhance embedded thumbnails☆12Updated 3 weeks ago
- StaticLog - super lightweight static logging for Kotlin, Java and Android☆28Updated 7 years ago
- Boilerplate Removal using Deep Learning☆81Updated 3 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆240Updated last month
- Multiplatform Kotlin Hello World (Android/iOS/Java/JavaScript/Native)☆76Updated 6 months ago
- Java client for txtai☆36Updated 3 weeks ago
- A Directory of Online Newspaper Sources for 70+ Languages☆32Updated 3 years ago
- Index Common Crawl archives in tabular format☆110Updated 2 months ago
- The largest English-language thesaurus☆287Updated 2 years ago
- Fauxflake is an easily embeddable, decentralized, k-ordered unique ID generator.☆41Updated 8 years ago
- Python code for building a GPT-3 based technical blog post optimizer.☆84Updated 2 years ago
- Use chromaprint library easily on Android with fpcalc-android☆14Updated 6 years ago
- A logger facilitating lazily-evaluated log calls via Kotlin's inline classes & functions.☆89Updated last year
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆71Updated 9 months ago
- A natural language date parser (Java port of chrono.js)☆28Updated 10 years ago
- Experiments with Elasticsearch query builder for Kotlin☆45Updated last month
- Logquacious (lq) is a fast and simple log viewer.☆59Updated 2 years ago