Grasia / wiki-scriptsLinks
Miscellaneous scripts to gather and process data of wikis.
☆20Updated 2 years ago
Alternatives and similar repositories for wiki-scripts
Users that are interested in wiki-scripts are comparing it to the libraries listed below
Sorting:
- TopicScan: Visualization and validation interface for NMF Topic Modeling☆23Updated 5 years ago
- Harassment Lexicon and Corpus☆30Updated 7 years ago
- ☆30Updated 3 years ago
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- Essential NLP & ML, short & fast pure Python code☆78Updated last month
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Updated 7 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆68Updated 3 years ago
- Calculate readability scores☆43Updated 6 years ago
- Experiments to help discussion on Wikipedia talk pages☆67Updated last month
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 3 years ago
- Toolkit to compile a comparable/parallel corpus from European Parliament proceedings☆16Updated 5 years ago
- Presentations & notebooks from our talks /workshops/meetups/etc☆24Updated 7 years ago
- CrowdTruth framework for crowdsourcing ground truth for training & evaluation of AI systems☆62Updated last year
- public repository of the interdisciplinary working group 'Hatespeech' of the research training group UCSM☆17Updated 6 years ago
- Notebooks and data associated to constructing and exploring a map of subreddits.☆55Updated 8 years ago
- German lemmatization with IWNLP as extension for spaCy☆25Updated 2 years ago
- ☆32Updated 10 years ago
- Analysis of gutenberg dataset☆45Updated 6 years ago
- ☆54Updated 3 years ago
- Training Temporal Word Embeddings with a Compass☆65Updated 2 months ago
- A spell-checker extending Peter Norvig's with multi-typo correction, hamming distance weighting, and more.☆98Updated 5 years ago
- Repository of data and code to use the models described in the paper "Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia…☆11Updated 2 years ago
- A collection of over 1.5 Million tweets data translated to French, with their sentiment.☆35Updated 8 years ago
- 💙 Emoji handling and meta data for spaCy with custom extension attributes☆182Updated 2 years ago
- Quickly extract multi-word phrases from a corpus☆194Updated 5 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆81Updated last year
- Implementation of Deep Dirichlet Multinomial Regression in python + cython.☆16Updated 7 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Template for AC297r projects☆33Updated 5 years ago
- ☆59Updated 10 years ago