cdimascio / essenceLinks
Automatically extract the main text content (and more) from an HTML document
☆118Updated 3 years ago
Alternatives and similar repositories for essence
Users that are interested in essence are comparing it to the libraries listed below
Sorting:
- Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.☆244Updated 10 months ago
- A Natural Language Date Time Parser that Extract date and time from text with context and parse to the required format☆245Updated 3 months ago
- A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.☆169Updated 4 years ago
- Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Fa…☆322Updated 2 weeks ago
- Life and collaboration assistant.☆41Updated last week
- Article extraction benchmark: dataset and evaluation scripts☆351Updated 4 months ago
- SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectur…☆102Updated 5 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆298Updated 8 months ago
- StaticLog - super lightweight static logging for Kotlin, Java and Android☆29Updated 8 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆252Updated last week
- Java client for txtai☆40Updated 2 weeks ago
- Java library to extract links (URLs, email addresses) from plain text; fast, small and smart☆214Updated 8 months ago
- The LAW next generation crawler.☆90Updated 4 years ago
- A java annotation library for Web scraping.☆28Updated 8 months ago
- The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike☆793Updated 10 months ago
- extJWNL (Extended Java WordNet Library) is a Java API for creating, reading and updating dictionaries in WordNet format.☆131Updated last year
- A language detection Web Service☆53Updated 8 years ago
- A Kotlin/Java API for generating .ts source files.☆48Updated 2 years ago
- Java natural language date parser☆526Updated 2 years ago
- Java autocomplete library.☆120Updated 5 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆128Updated last week
- Multiplatform Kotlin Hello World (Android/iOS/Java/JavaScript/Native)☆79Updated 3 months ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- A dataset of multinational first names and last names☆27Updated 2 years ago
- Simple OAuth 2.0 client written in Kotlin☆25Updated 8 years ago
- A Java library to determine probability of objects being similar.☆258Updated last month
- Experiments with Elasticsearch query builder for Kotlin☆45Updated 6 months ago
- Index Common Crawl archives in tabular format☆125Updated last month
- An overview of the AI-as-a-service landscape☆161Updated 7 years ago
- A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.☆352Updated last year