Skylion007 / OpenWebTextCorpusLinks
☆22Updated last year
Alternatives and similar repositories for OpenWebTextCorpus
Users that are interested in OpenWebTextCorpus are comparing it to the libraries listed below
Sorting:
- A simple ElasticSearch plugin wrapping around the search endpoint to provide Rocchio query expansion☆17Updated 8 years ago
- A web application tagging and retrieval of arguments in text☆29Updated 2 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- sumgram is a tool that summarizes a collection of text documents by generating the most frequent sumgrams (conjoined ngrams)☆56Updated last year
- Anonymization of legal cases (Fr) based on Flair embeddings☆87Updated 4 years ago
- 📄Neural Sentential Paraphrase Generation to Augment Chatbot Training Dataset☆21Updated 2 years ago
- A temporal ordering system for events and time expressions in written text.☆42Updated 3 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆118Updated 3 months ago
- framework for doing NER and other types of entity recognition, in Python☆68Updated 3 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- numeric fused-head identification and resolution☆33Updated 5 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- AmbiverseNLU: A Natural Language Understanding suite by Max Planck Institute for Informatics☆212Updated last year
- Automatically exported from code.google.com/p/wiki-links☆42Updated 9 years ago
- An unsupervised compound splitter☆41Updated 6 years ago
- Keras implementation of ontology aware token embeddings☆49Updated 6 years ago
- NEWS: JATE2.0 Beta.11 Released, see details below.☆82Updated 2 years ago
- Extension of the mate-tools NLP pipeline☆67Updated 9 years ago
- A Large Scale Alignment of NaturalLanguage with Knowledge Base Triples for Relation Extraction and Natural language Generation☆46Updated 7 years ago
- ☆59Updated 10 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆42Updated 5 years ago
- Entity Linking for the masses☆56Updated 9 years ago
- A collection of simple tutorials for using Fonduer☆100Updated 4 years ago
- ☆70Updated 2 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 9 years ago
- A tool for text normalisation via character-level machine translation☆13Updated 5 years ago
- Socially-Equitable Language Identification☆78Updated 2 years ago
- ☆123Updated 2 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆68Updated 3 years ago