dossier / html-highlighterLinks
Highlight and select phrases in HTML pages.
☆24Updated 5 years ago
Alternatives and similar repositories for html-highlighter
Users that are interested in html-highlighter are comparing it to the libraries listed below
Sorting:
- Index URLs in Common Crawl☆194Updated 7 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆35Updated 10 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- ☆43Updated 9 years ago
- Aviation grade news article metadata extraction☆36Updated 2 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- command-line tool to extract taxonomies from Wikidata☆127Updated 6 years ago
- Semanticizest: dump parser and client☆20Updated 9 years ago
- mltk - Moz Language Tool Kit☆12Updated 10 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated last year
- A trend viewer written in Python/JavaScript☆21Updated 8 months ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆116Updated 9 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 10 years ago
- Browser add-on and web server to support collection and analysis of web browsing data.☆13Updated 9 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 8 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆167Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 4 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆37Updated last year
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- Extraction Toolkit☆83Updated 3 years ago
- Trying to generate name synonyms from wikidata☆32Updated 5 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Updated 8 years ago
- Open source large document set visualization platform☆268Updated 2 years ago