rossf7 / elasticrawlLinks
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
☆49Updated 8 years ago
Alternatives and similar repositories for elasticrawl
Users that are interested in elasticrawl are comparing it to the libraries listed below
Sorting:
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- The Summarizer from the Web IR / NLP Group (WING), hence SWING, is a modular, state-of-the-art automatic extractive text summarization sy…☆39Updated 10 years ago
- Semanticizest: dump parser and client☆20Updated 9 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- Model Training tool for MITIE☆79Updated 9 years ago
- ☆21Updated 7 years ago
- Extract postal addresses from the DOM☆66Updated 12 years ago
- Demonstration of using Python to process the Common Crawl dataset with the mrjob framework☆167Updated 3 years ago
- A dialog system framework for conversational services.☆61Updated 8 years ago
- Keeps a mirror of DBpedia live in sync☆26Updated 3 years ago
- Supervised learning for novelty detection in text☆78Updated 8 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 9 years ago
- Parser and standardizer for politician, individual and organization names.☆129Updated 8 years ago
- Elasticsearch Latent Semantic Indexing experimentation☆33Updated 5 years ago
- Wikipedia Live Monitor☆21Updated 6 months ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 8 years ago
- ☆43Updated 9 years ago
- For extracting measurements and related entities from text☆58Updated 5 years ago
- a set of services that provide NLP facilities☆25Updated 4 years ago
- Entity Linking for the masses☆56Updated 9 years ago
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆149Updated 3 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆58Updated 4 years ago
- Implicit relation extractor using a natural language model.☆24Updated 7 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Tools to download and process name data from various sources.☆92Updated 11 years ago
- Using word2vec and t-SNE to compare text sources.☆20Updated 9 years ago