Grasia / wiki-scriptsLinks
Miscellaneous scripts to gather and process data of wikis.
☆21Updated 2 years ago
Alternatives and similar repositories for wiki-scripts
Users that are interested in wiki-scripts are comparing it to the libraries listed below
Sorting:
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- German lemmatization with IWNLP as extension for spaCy☆24Updated 2 years ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- A collection of over 1.5 Million tweets data translated to French, with their sentiment.☆35Updated 8 years ago
- A compound word splitter for Python☆48Updated 3 years ago
- TopicScan: Visualization and validation interface for NMF Topic Modeling☆23Updated 5 years ago
- Experiments to help discussion on Wikipedia talk pages☆66Updated last week
- Anonymization of legal cases (Fr) based on Flair embeddings☆88Updated 4 years ago
- A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python☆111Updated 2 months ago
- 💙 Emoji handling and meta data for spaCy with custom extension attributes☆181Updated 2 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated last year
- Wikidata embedding☆50Updated 9 months ago
- A set of utility scripts to process Wikipedia related data☆38Updated 3 years ago
- Repository of data and code to use the models described in the paper "Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia…☆10Updated 2 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- spaCy + UDPipe☆162Updated 3 years ago
- Easy-to-use text representations extraction library based on the Transformers library.☆32Updated 2 years ago
- This repo contains the code used to generate the French Wikipedia sample used in the QA annotation project PIAF☆11Updated 4 years ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Python package for stylometry☆63Updated 4 years ago
- A fully customisable language detection pipeline for spaCy☆93Updated 6 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- Storage and retrieval of Word Embeddings in various databases☆51Updated 6 years ago
- Compare accuracies of udpipe models and spacy models which can be used for NLP annotation☆14Updated 7 years ago
- Toolkit to compile a comparable/parallel corpus from European Parliament proceedings☆16Updated 5 years ago
- ☆70Updated 2 years ago
- Calculate readability scores☆42Updated 6 years ago
- Interpretable data visualizations for understanding how texts differ at the word level☆280Updated 5 months ago
- Quickly extract multi-word phrases from a corpus☆193Updated 5 years ago