dossier / html-highlighter
Highlight and select phrases in HTML pages.
☆24Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for html-highlighter
- ☆42Updated 8 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 9 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆34Updated 9 years ago
- Browser add-on and web server to support collection and analysis of web browsing data.☆13Updated 8 years ago
- General Architecture for Text Engineering☆45Updated 8 years ago
- [UNMAINTAINED] Firefox addon for Scrapely☆5Updated 8 years ago
- mltk - Moz Language Tool Kit☆12Updated 9 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆46Updated 2 years ago
- Topic modeling web application☆39Updated 9 years ago
- Semanticizest: dump parser and client☆20Updated 8 years ago
- MITIE: library and tools for information extraction☆29Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆14Updated 9 years ago
- A Relaxed Schema Graph Database Management System☆53Updated 4 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- Index URLs in Common Crawl☆193Updated 7 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆34Updated 8 years ago
- Supervised learning for novelty detection in text☆79Updated 8 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Updated 7 years ago
- Pattern-of-Behavior Search Tool☆11Updated 2 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 9 years ago
- a pure javascript frontend for ElasticSearch search indices.☆79Updated 6 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Trying to generate name synonyms from wikidata☆33Updated 4 years ago
- IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).☆31Updated 5 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆94Updated 6 years ago
- Common web archive utility code.☆50Updated 3 weeks ago