TeamHG-Memex / sitehound
This is the facade for installation and access to the individual components
☆15Updated 6 years ago
Alternatives and similar repositories for sitehound:
Users that are interested in sitehound are comparing it to the libraries listed below
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Streaming web crawler with WebSocket API☆44Updated last year
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated last month
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- A simple proxy web service in 19 lines of Python code.☆23Updated 10 years ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆150Updated 2 months ago
- General Architecture for Text Engineering☆49Updated 9 years ago
- extract difference between two html pages☆32Updated 6 years ago
- Orchestrate web crawlers to create structured datasets from multiple data sources with YAML configs.☆14Updated 2 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- MITIE: library and tools for information extraction☆29Updated 10 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated last year
- An Exploration into Graph Databases☆28Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Faceted search engine for domain-specific exploration of the Web☆45Updated 8 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆118Updated 10 months ago
- Temporal Anomaly Detector (TAD)☆15Updated 7 years ago
- Simple NGram Fast Indexer & Searcher☆37Updated 2 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 10 months ago
- T2K Match is a matching algorithm optimised to match millions of web tables to a central knowledge base.☆21Updated 6 years ago
- [UNMAINTAINED] Firefox addon for Scrapely☆5Updated 9 years ago