commoncrawl / commoncrawl-examples
A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)
☆65Updated 8 years ago
Alternatives and similar repositories for commoncrawl-examples:
Users that are interested in commoncrawl-examples are comparing it to the libraries listed below
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Simple FieldCache based query introspection Solr Search Component - solves the 'red sofa' problem☆12Updated 3 weeks ago
- Fast and robust NLP components implemented in Java.☆52Updated 4 years ago
- Website for standardized execution and evaluation of algorithms on datasets.☆36Updated 5 years ago
- Meta-repository for the open-source version of the SUMMA Platform☆16Updated 10 months ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 2 years ago
- D3 and Play based visualization for entity-relation graphs, especially for NLP and information extraction☆29Updated 9 years ago
- 💫 Runtime performance comparison of spaCy against other NLP libraries☆20Updated 2 years ago
- Text similarity based on Word2Vec vectors.☆11Updated 8 years ago
- Provided Guidance on Creating End to End Solutions for Common SILK Use Cases☆13Updated 9 years ago
- System for mining Wikipedia Usage data to read our collective mind☆21Updated 10 years ago
- ☆13Updated 9 years ago
- A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and …☆48Updated 3 years ago
- Raw Wikipedia counts for entity linking☆19Updated 7 years ago
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Updated 2 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 10 years ago
- A set of tools for performing Labeled Latent Dirichlet Allocation on textual datasets, with an emphasis on Twitter profiles. Contains too…☆42Updated 3 years ago
- Exploration Library in Java☆12Updated last year
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆28Updated 6 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago
- WARC (Web Archive) Input and Output Formats for Hadoop☆35Updated 10 years ago
- VoltDB Click Stream Processing Example.☆16Updated 7 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- A java library for stored queries☆16Updated last year
- KnowledgeStore☆20Updated 7 years ago
- ☆20Updated 7 years ago
- An HTTP proxy for Elasticsearch, Solr (etc.) to prevent a 100% full disk situation.☆11Updated 6 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 3 weeks ago
- ☆20Updated 8 years ago