cdimascio / essenceLinks

Automatically extract the main text content (and more) from an HTML document

☆118

Alternatives and similar repositories for essence

Users that are interested in essence are comparing it to the libraries listed below

Sorting:

chimbori / crux
Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.
☆244Updated 10 months ago
zoho / hawking
A Natural Language Date Time Parser that Extract date and time from text with context and parse to the required format
☆245Updated 3 months ago
dankito / Readability4J
A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.
☆169Updated 4 years ago
sokomishalov / skraper
Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Fa…
☆322Updated 2 weeks ago
Heapy / komok
Life and collaboration assistant.
☆41Updated last week
scrapinghub / article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
☆351Updated 4 months ago
KotlinNLP / SimpleDNN
SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectur…
☆102Updated 5 years ago
currentslab / extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…
☆298Updated 8 months ago
jupf / staticlog
StaticLog - super lightweight static logging for Kotlin, Java and Android
☆29Updated 8 years ago
crawler-commons / crawler-commons
A set of reusable Java components that implement functionality common to any web crawler
☆252Updated last week
neuml / txtai.java
Java client for txtai
☆40Updated 2 weeks ago
robinst / autolink-java
Java library to extract links (URLs, email addresses) from plain text; fast, small and smart
☆214Updated 8 months ago
LAW-Unimi / BUbiNG
The LAW next generation crawler.
☆90Updated 4 years ago
beothorn / webGrude
A java annotation library for Web scraping.
☆28Updated 8 months ago
pemistahl / lingua
The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
☆793Updated 10 months ago
extjwnl / extjwnl
extJWNL (Extended Java WordNet Library) is a Java API for creating, reading and updating dictionaries in WordNet format.
☆131Updated last year
deezer / weslang
A language detection Web Service
☆53Updated 8 years ago
outfoxx / typescriptpoet
A Kotlin/Java API for generating .ts source files.
☆48Updated 2 years ago
joestelmach / natty
Java natural language date parser
☆526Updated 2 years ago
fmmfonseca / completely
Java autocomplete library.
☆120Updated 5 years ago
apify / actor-scraper
House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
☆128Updated last week
wojta / hello-kotlin-multiplatform
Multiplatform Kotlin Hello World (Android/iOS/Java/JavaScript/Native)
☆79Updated 3 months ago
NikolaiT / scrapeulous
Cloud crawler functions for scrapeulous
☆45Updated 4 years ago
solvenium / names-dataset
A dataset of multinational first names and last names
☆27Updated 2 years ago
mazine / oauth2-client-kotlin
Simple OAuth 2.0 client written in Kotlin
☆25Updated 8 years ago
intuit / fuzzy-matcher
A Java library to determine probability of objects being similar.
☆258Updated last month
anti-social / elasticmagic-kt
Experiments with Elasticsearch query builder for Kotlin
☆45Updated 6 months ago
commoncrawl / cc-index-table
Index Common Crawl archives in tabular format
☆125Updated last month
sekwiatkowski / awesome-ai-services
An overview of the AI-as-a-service landscape
☆161Updated 7 years ago
alan-turing-institute / ReadabiliPy
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
☆352Updated last year