jonsafari / tok-tokView external linksLinks
A fast, simple, multilingual tokenizer
☆29May 24, 2017Updated 8 years ago
Alternatives and similar repositories for tok-tok
Users that are interested in tok-tok are comparing it to the libraries listed below
Sorting:
- Fast Word Clustering Software☆79Feb 8, 2025Updated last year
- Easy language identification of 380 languages☆17Dec 2, 2019Updated 6 years ago
- ☆12Feb 9, 2019Updated 7 years ago
- Joint multi-task emotion deep neural model for emotion classification in multigenre.☆14May 10, 2024Updated last year
- AROW++ An implementation of the efficient confidence-weighted classifier☆11Jan 9, 2021Updated 5 years ago
- Accompanying code for our EMNLP 2017 publication "GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Ex…☆27May 27, 2023Updated 2 years ago
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Sep 1, 2016Updated 9 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Feb 27, 2024Updated last year
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- tools to analyze a collection of texts and identify relevant words☆12May 20, 2018Updated 7 years ago
- Python evaluation scripts for AIDA-formatted CoNLL data☆20Aug 4, 2014Updated 11 years ago
- Course on Language Technologies and NLP☆15May 15, 2017Updated 8 years ago
- a conversion of Dadegan corpus (first Persian dependency corpus) to the universal dependency version☆15Nov 26, 2025Updated 2 months ago
- maximum entropy based part-of-speech tagger for NLTK☆45Dec 8, 2016Updated 9 years ago
- List of text corpora (text dataset in Persian) that we used in FarsiYar text-mining tools☆18Jul 16, 2019Updated 6 years ago
- Generate crappy products and reviews using Amazon's dataset☆17Jan 11, 2016Updated 10 years ago
- ChatGPT plugin for Singapore HDB car park availability☆19Jun 7, 2023Updated 2 years ago
- Java library to tokenize Thai text into a list of TCCs☆19May 30, 2017Updated 8 years ago
- Pre-Trained NER models for Persian 🦁☆23May 28, 2021Updated 4 years ago
- The dataset and statistical analysis code released with the submission of EMNLP 2017 paper "Why We Need New Evaluation Metrics for NLG"☆19Nov 16, 2021Updated 4 years ago
- DyNet implementation of stack LSTM experiments by Grefenstette et al.☆21Oct 6, 2017Updated 8 years ago
- BlackboxNLP 2019: Analyzing and interpreting neural networks for NLP☆18Aug 1, 2019Updated 6 years ago
- Codenize your datasources.☆27Dec 1, 2024Updated last year
- A tutorial about DBpedia and Linked Data in general☆23Nov 7, 2014Updated 11 years ago
- Democratizing NLP!☆105Dec 6, 2023Updated 2 years ago
- ☆25Apr 28, 2020Updated 5 years ago
- Ubiflux Vigor ventilation system RS485 Modbus communications with Python☆11Jan 28, 2026Updated 2 weeks ago
- A repo for sharing language resources related to the outbreak (in machine readable format)☆25Sep 22, 2025Updated 4 months ago
- Named-Entity Recognition in Persian Language☆60Jul 23, 2020Updated 5 years ago
- An Implementation of Transformer (Attention Is All You Need) in DyNet☆65Nov 30, 2023Updated 2 years ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆69Sep 14, 2021Updated 4 years ago
- The Powerful Python CMS☆11Nov 20, 2021Updated 4 years ago
- Disambiguation of Semantic Resources - Full framework☆30Oct 31, 2016Updated 9 years ago
- Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefi…☆36Jun 26, 2025Updated 7 months ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆74Feb 25, 2015Updated 10 years ago
- framework for doing NER and other types of entity recognition, in Python☆68Jun 21, 2022Updated 3 years ago
- Open-source dependency parser, part-of-speech tagger, and text normalizer for Farsi (Persian)☆43Jun 4, 2014Updated 11 years ago
- A command-line program to download text corpora.☆34Aug 12, 2017Updated 8 years ago
- Analyzing Uncertainty in Neural Machine Translation☆34Sep 15, 2021Updated 4 years ago