shlomiv / warc-mapreduceLinks
warc and wet support for Hadoop's mapreduce api
☆13Updated 10 years ago
Alternatives and similar repositories for warc-mapreduce
Users that are interested in warc-mapreduce are comparing it to the libraries listed below
Sorting:
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- ☆13Updated 2 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- UMLS in Python with MongoDB.☆18Updated 6 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago
- ☆21Updated 8 years ago
- Common web archive utility code.☆55Updated last month
- Interactive D3.js visualization for word2vec datasets☆14Updated last month
- Open Use of Data Agreement - Removing Barriers to Data Innovation☆17Updated 3 years ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆57Updated 4 years ago
- ☆12Updated 2 years ago
- Disambiguating biomedical and clinical concepts with word embeddings☆14Updated 7 years ago
- Social Context Analysis aNd Emotion Recognition☆12Updated 8 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆65Updated last year
- Building and Using Knowledge Graphs made easy☆47Updated 4 months ago
- Training Tesseract to better extract serial numbers from images of electronic items☆9Updated 8 years ago
- ☆22Updated last year
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…☆18Updated 2 months ago
- General Architecture for Text Engineering☆50Updated 9 years ago
- Image recognition on Spark cluster powered by Deeplearning4j and Apache Tika☆14Updated 8 years ago
- Code examples for the book https://leanpub.com/cognitive-computing☆10Updated 2 years ago
- Implements dictionary-based entity extraction as described in the FAERIE paper http://dbgroup.cs.tsinghua.edu.cn/dd/papers/sigmod2011-fae…☆9Updated 8 years ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆83Updated 5 years ago
- Library for biomedical knowledge manipulation☆23Updated 11 years ago
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 7 years ago
- Build Neo4j graphs from Datashare projects☆13Updated this week
- List of online / computer-based annotation tools☆18Updated 8 years ago
- ☆20Updated 8 years ago