cdimascio / essenceLinks
Automatically extract the main text content (and more) from an HTML document
☆118Updated 3 years ago
Alternatives and similar repositories for essence
Users that are interested in essence are comparing it to the libraries listed below
Sorting:
- Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.☆241Updated 5 months ago
- A Natural Language Date Time Parser that Extract date and time from text with context and parse to the required format☆242Updated 11 months ago
- Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Fa…☆298Updated this week
- Life and collaboration assistant.☆35Updated this week
- A Kotlin port of Mozilla‘s Readability. It extracts a website‘s relevant content and removes all clutter from it.☆162Updated 3 years ago
- Java library to extract links (URLs, email addresses) from plain text; fast, small and smart☆212Updated 2 months ago
- SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectur…☆102Updated 5 years ago
- The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike☆768Updated 5 months ago
- Article extraction benchmark: dataset and evaluation scripts☆321Updated last year
- A set of reusable Java components that implement functionality common to any web crawler☆246Updated 2 weeks ago
- Java client for txtai☆38Updated 2 months ago
- A simple Java library for reading RSS and Atom feeds☆180Updated this week
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Readability clone in Java☆460Updated 4 years ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆183Updated 8 months ago
- A natural language event parser for java and android.☆103Updated 4 years ago
- Google Search Results JAVA API via SerpApi☆45Updated 2 months ago
- Use chromaprint library easily on Android with fpcalc-android☆13Updated 6 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆292Updated 3 months ago
- A language detection Web Service☆53Updated 8 years ago
- Java library to build modern applications with high-def itemized financial data. OCR, AI, and NLP for receipts, invoices, bills, and RFC8…☆17Updated 5 months ago
- NameKrea is an AI Domain Name Generator which uses GPT-2☆50Updated 2 years ago
- A Directory of Online Newspaper Sources for 70+ Languages☆32Updated 4 years ago
- A web crawling framework written in Kotlin☆131Updated 4 years ago
- A Java library for the Giphy API.☆28Updated 8 years ago
- Logquacious (lq) is a fast and simple log viewer.☆60Updated 3 years ago
- StaticLog - super lightweight static logging for Kotlin, Java and Android☆29Updated 7 years ago
- Statistics of Common Crawl monthly archives mined from URL index files☆189Updated this week
- Treat your Dockerfiles as self-contained, editable scripts☆103Updated 4 years ago
- XTractor is an algorithmic text extractor from web pages written in Java. It builds upon the "commonly used web design practices" approac…☆43Updated 9 years ago