TeamHG-Memex / sitehound
This is the facade for installation and access to the individual components
☆15Updated 6 years ago
Alternatives and similar repositories for sitehound:
Users that are interested in sitehound are comparing it to the libraries listed below
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Streaming web crawler with WebSocket API☆44Updated last year
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆59Updated 2 weeks ago
- General Architecture for Text Engineering☆48Updated 8 years ago
- extract difference between two html pages☆32Updated 6 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- A list of SaaS, PaaS and IaaS offerings that have free tiers for devops and infradev☆9Updated 9 years ago
- Simple NGram Fast Indexer & Searcher☆37Updated 2 years ago
- Universal backend for indexing, storing, and querying documents.☆25Updated 5 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- Solr Relevance Ranking Analysis and Visualization Tool☆17Updated 5 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- Faceted search engine for domain-specific exploration of the Web☆45Updated 8 years ago
- Sharable Grakn knowledge graphs☆13Updated 2 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Events and Situations Ontology☆14Updated 6 years ago
- Data notification service: subscribe to keywords and get notified whenever an open data sources mentions that keyword.☆24Updated 11 years ago
- Implementation of Context-Graph algorithms for graph enrichment and querying.☆24Updated 9 years ago
- A rotating socks proxy using Tor, Delegate and Haproxy☆14Updated 5 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆64Updated 2 months ago
- This page is a companion for the paper titled Towards Automatic Structuring and Semantic Indexing of Legal Documents☆29Updated 6 years ago
- Code and templates required to build the DARPA open catalog.☆17Updated 8 years ago
- A simple proxy web service in 19 lines of Python code.☆23Updated 10 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆96Updated 2 years ago
- MITIE: library and tools for information extraction☆29Updated 10 years ago