Grasia / wiki-scriptsLinks
Miscellaneous scripts to gather and process data of wikis.
☆20Updated 2 years ago
Alternatives and similar repositories for wiki-scripts
Users that are interested in wiki-scripts are comparing it to the libraries listed below
Sorting:
- This is the text partitioner project for Python.☆21Updated 6 years ago
- TopicScan: Visualization and validation interface for NMF Topic Modeling☆23Updated 5 years ago
- Presentations & notebooks from our talks /workshops/meetups/etc☆24Updated 7 years ago
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- Python package for stylometry☆63Updated 4 years ago
- ☆70Updated 2 years ago
- public repository of the interdisciplinary working group 'Hatespeech' of the research training group UCSM☆17Updated 6 years ago
- Calculate readability scores☆42Updated 6 years ago
- ☆32Updated 10 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- Notebooks and data associated to constructing and exploring a map of subreddits.☆55Updated 8 years ago
- Negation detection NLP tool. If you use the code, please cite George Gkotsis, Sumithra Velupillai, Anika Oellrich, Harry Dean,…☆54Updated 8 years ago
- Easy-to-use text representations extraction library based on the Transformers library.☆32Updated 2 years ago
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Updated 7 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated last year
- A PyPI package for easy text annotation in a Jupyter Notebook.☆28Updated 4 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 3 years ago
- Experiments to help discussion on Wikipedia talk pages☆67Updated last month
- Bag of, not words, but tricks!☆68Updated last year
- A thin wrapper around the DBpedia Spotlight HTTP API☆25Updated 7 years ago
- This repository contains machine learning related work for the corpus to graph project, including Jupyter research notebooks and a Flask …☆46Updated 9 years ago
- Cleans Reddit Text Data☆83Updated 5 years ago
- Package that returns a company embedding given a company name☆47Updated 5 years ago
- An alternative approach for probabilistic topic modeling based on agglomerative clustering of topics (not documents)☆12Updated 4 years ago
- A spell-checker extending Peter Norvig's with multi-typo correction, hamming distance weighting, and more.☆98Updated 5 years ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- Compare accuracies of udpipe models and spacy models which can be used for NLP annotation☆14Updated 7 years ago
- Template for AC297r projects☆33Updated 5 years ago
- See https://meta.wikimedia.org/wiki/Research:Modeling_Talk_Page_Abuse☆150Updated 5 years ago
- The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016…☆68Updated 3 years ago