dossier / html-highlighter
Highlight and select phrases in HTML pages.
☆24Updated 5 years ago
Alternatives and similar repositories for html-highlighter:
Users that are interested in html-highlighter are comparing it to the libraries listed below
- ☆43Updated 9 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆34Updated 10 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Semanticizest: dump parser and client☆20Updated 8 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Trying to generate name synonyms from wikidata☆32Updated 4 years ago
- Browser add-on and web server to support collection and analysis of web browsing data.☆13Updated 9 years ago
- Pattern-of-Behavior Search Tool☆11Updated 2 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- General Architecture for Text Engineering☆49Updated 9 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- ☆20Updated 7 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 6 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Aviation grade news article metadata extraction☆37Updated 2 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Open source large document set visualization platform☆268Updated 2 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆290Updated 9 years ago
- Data science tools from Moz☆22Updated 8 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- Topic modeling web application☆40Updated 9 years ago
- [UNMAINTAINED] Firefox addon for Scrapely☆5Updated 9 years ago
- Facet Search interface for MEMEX.☆13Updated 10 years ago