TeamHG-Memex / sitehound
This is the facade for installation and access to the individual components
☆16Updated 6 years ago
Related projects: ⓘ
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 2 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 2 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated 7 months ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- extract difference between two html pages☆32Updated 6 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆25Updated 3 months ago
- Faceted search engine for domain-specific exploration of the Web☆45Updated 7 years ago
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆54Updated last month
- ☆28Updated this week
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆142Updated 7 months ago
- Browser add-on and web server to support collection and analysis of web browsing data.☆13Updated 8 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 6 years ago
- Streaming web crawler with WebSocket API☆44Updated last year
- General Architecture for Text Engineering☆45Updated 8 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- API client for Aleph, supports bulk entity and document upload.☆27Updated last month
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆14Updated 9 years ago
- MITIE: library and tools for information extraction☆29Updated 9 years ago
- A POC at replicating Facebook Graph Search with Cypher and Neo4j☆103Updated 11 years ago
- Highlight and select phrases in HTML pages.☆24Updated 4 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆116Updated 3 months ago
- A classifier for detecting soft 404 pages☆56Updated last year
- A rotating socks proxy using Tor, Delegate and Haproxy☆14Updated 4 years ago
- ☆34Updated this week
- Sharable Grakn knowledge graphs☆13Updated last year
- ☆25Updated 8 years ago