TeamHG-Memex / sitehound
This is the facade for installation and access to the individual components
☆15Updated 6 years ago
Alternatives and similar repositories for sitehound:
Users that are interested in sitehound are comparing it to the libraries listed below
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- Streaming web crawler with WebSocket API☆44Updated last year
- extract difference between two html pages☆32Updated 6 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated 3 weeks ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆59Updated 2 weeks ago
- Trying to generate name synonyms from wikidata☆32Updated 4 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 10 months ago
- General Architecture for Text Engineering☆48Updated 9 years ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated 5 months ago
- Broad crawler for domain discovery☆19Updated 6 years ago
- A POC at replicating Facebook Graph Search with Cypher and Neo4j☆101Updated 11 years ago
- Python wrapper for Apache Tika, made to be easy_installed☆25Updated 12 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- [UNMAINTAINED] Firefox addon for Scrapely☆5Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- RDFLib store using SQLAlchemy dbapi as back-end☆152Updated last year
- Common Crawl Index Server☆67Updated 3 weeks ago
- Systematic Classification Engine for Advanced Data ANalysis☆22Updated 8 years ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆149Updated 2 months ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Semanticizest: dump parser and client☆20Updated 8 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Simple NGram Fast Indexer & Searcher☆37Updated 2 years ago