fitnr / unwiki
Python module to remove wiki markup text.
☆11Updated 9 years ago
Alternatives and similar repositories for unwiki:
Users that are interested in unwiki are comparing it to the libraries listed below
- SemEval 2019 Hyperpartisan News Detection - team Bertha von Suttner contribution☆22Updated 5 years ago
- Finds linguistic patterns effortlessly☆36Updated last year
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated last year
- Rust python bindings for symspell☆19Updated last year
- KenLM extension for spaCy 2.0.☆16Updated 7 years ago
- Extract, parse and populate templates from strings☆27Updated 6 years ago
- Download, manage, and search a BibTeX database.☆64Updated 6 years ago
- A Python implementation of the Metaphone and Double Metaphone algorithms☆81Updated last year
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 9 years ago
- This is an Object Oriented implementation of a Trie in python. The class contains setter and getter methods, and implements several usefu…☆14Updated 7 years ago
- Exploring the shapes of stories using indico sentiment analysis APIs☆28Updated 9 years ago
- A collection of selected of models built with AllenNLP.☆25Updated 5 years ago
- bin files☆13Updated 2 months ago
- Gather module dependencies of source code☆11Updated last year
- Running Prodigy for a team of annotators☆53Updated 4 years ago
- IPython magic for parallel profiling (like `%time`, but parallel)☆71Updated 7 years ago
- The NLPStatTest project☆12Updated 3 years ago
- Tokenizer for Twitter and Reddit data☆46Updated 6 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated 2 years ago
- allennlp + streamlit demo☆22Updated 5 years ago
- sequence tagging with spaCy and crfsuite☆19Updated 2 years ago
- A compound splitter based on the semantic regularities in the vector space of word embeddings.☆16Updated 8 years ago
- Labeled examples from wiki dumps in Python☆67Updated 8 years ago
- Black for Python docstrings and reStructuredText (rst).☆18Updated 2 years ago
- Ensemble topic modeling with matrix factorization☆25Updated 6 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 3 years ago
- pelican-bibtex: Manage your academic publications page with Pelican and BibTeX☆52Updated last year
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- numeric fused-head identification and resolution☆33Updated 5 years ago