Skylion007 / OpenWebTextCorpusLinks
☆22Updated last year
Alternatives and similar repositories for OpenWebTextCorpus
Users that are interested in OpenWebTextCorpus are comparing it to the libraries listed below
Sorting:
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- A collection of simple tutorials for using Fonduer☆100Updated 4 years ago
- Experiments to help discussion on Wikipedia talk pages☆67Updated this week
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- AmbiverseNLU: A Natural Language Understanding suite by Max Planck Institute for Informatics☆212Updated last year
- Automatically exported from code.google.com/p/wiki-links☆42Updated 9 years ago
- A Corpus of Quotes☆69Updated 6 years ago
- Datasets I have created for scientific summarization, and a trained BertSum model☆115Updated 5 years ago
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆42Updated 5 years ago
- The WebSplit Benchmark introducing "Split and Rephrase" task☆63Updated 6 years ago
- Anonymization of legal cases (Fr) based on Flair embeddings☆87Updated 4 years ago
- Python SDK for the TextRazor Text Analytics API☆20Updated last year
- ☆123Updated 2 years ago
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.☆154Updated 10 months ago
- A web application tagging and retrieval of arguments in text☆29Updated 2 years ago
- 📄Neural Sentential Paraphrase Generation to Augment Chatbot Training Dataset☆21Updated 2 years ago
- ☆97Updated 4 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆118Updated 2 months ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- ☆59Updated 10 years ago
- framework for doing NER and other types of entity recognition, in Python☆68Updated 3 years ago
- A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python☆111Updated 3 months ago
- Using ML to extract campaign finance data from messy forms for journalism☆77Updated 3 years ago
- Implementation of a simple frame identification approach (SimpleFrameId) described in the paper "Out-of-domain FrameNet Semantic Role Lab…☆15Updated 8 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 13 years ago
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Updated 6 years ago
- Official details for: [1803.08493] Context is Everything: Finding Meaning Statistically in Semantic Spaces☆39Updated 6 years ago
- An unsupervised compound splitter☆41Updated 5 years ago
- A clean and easy interface for performing nearest-neighbor lookups☆50Updated 5 years ago