TeamHG-Memex / sitehoundLinks
This is the facade for installation and access to the individual components
☆15Updated 7 years ago
Alternatives and similar repositories for sitehound
Users that are interested in sitehound are comparing it to the libraries listed below
Sorting:
- A component that tries to avoid downloading duplicate content☆27Updated 7 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated last year
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 4 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated last year
- General Architecture for Text Engineering☆49Updated 9 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆66Updated this week
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆189Updated 4 years ago
- Now included in rigour☆151Updated 2 weeks ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- A generic crawler☆78Updated 7 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆270Updated 2 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆190Updated 3 years ago
- extract difference between two html pages☆32Updated 7 years ago
- A classifier for detecting soft 404 pages☆56Updated 2 years ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated 10 months ago
- Skinfer is a tool for inferring and merging JSON schemas☆139Updated last year
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 2 years ago
- An academic open source and open data web crawler☆27Updated 7 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 3 years ago
- framework for scraping legislative/government data☆88Updated 11 months ago
- Broad crawler for domain discovery☆19Updated 7 years ago
- Natural Language Generator for Python☆27Updated 8 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Trying to generate name synonyms from wikidata☆32Updated 5 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Detective.io is a platform that hosts your investigation and lets you make powerful queries to mine it. Simply describe your field of stu…☆136Updated 10 years ago