Skylion007 / OpenWebTextCorpusLinks
☆23Updated last year
Alternatives and similar repositories for OpenWebTextCorpus
Users that are interested in OpenWebTextCorpus are comparing it to the libraries listed below
Sorting:
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated 2 years ago
- Anonymization of legal cases (Fr) based on Flair embeddings☆87Updated 4 years ago
- Experiments to help discussion on Wikipedia talk pages☆68Updated this week
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- A web application tagging and retrieval of arguments in text☆29Updated 2 years ago
- A compound word splitter for Python☆49Updated 4 years ago
- Running Prodigy for a team of annotators☆53Updated 4 years ago
- A collection of simple tutorials for using Fonduer☆100Updated 5 years ago
- interactive explorer for language models☆135Updated 3 years ago
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.☆155Updated last year
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Updated 6 years ago
- AmbiverseNLU: A Natural Language Understanding suite by Max Planck Institute for Informatics☆212Updated last year
- Python SDK for the TextRazor Text Analytics API☆20Updated 2 years ago
- sumgram is a tool that summarizes a collection of text documents by generating the most frequent sumgrams (conjoined ngrams)☆56Updated last year
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- Using ML to extract campaign finance data from messy forms for journalism☆77Updated 3 years ago
- A Named-Entity Recogniser based on Grobid.☆54Updated 6 months ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆118Updated 4 months ago
- Entity Linking for the masses☆56Updated 10 years ago
- numeric fused-head identification and resolution☆33Updated 6 years ago
- A tool for visualizing trees, tailored specifically to the analysis of parse trees.☆83Updated 5 years ago
- 💙 Emoji handling and meta data for spaCy with custom extension attributes☆182Updated 2 years ago
- ☆70Updated 3 years ago
- Disambiguation of Semantic Resources - Full framework☆30Updated 9 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆42Updated 5 years ago
- Official details for: [1803.08493] Context is Everything: Finding Meaning Statistically in Semantic Spaces☆39Updated 6 years ago
- Automatically exported from code.google.com/p/wiki-links☆43Updated 9 years ago
- Implementation of a simple frame identification approach (SimpleFrameId) described in the paper "Out-of-domain FrameNet Semantic Role Lab…☆15Updated 8 years ago
- Example using Polyaxon to experiment with pre-training spaCy☆65Updated 4 years ago