jonsafari/tok-tok

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jonsafari/tok-tok)

jonsafari / tok-tok

A fast, simple, multilingual tokenizer

☆29

Alternatives and similar repositories for tok-tok

Users that are interested in tok-tok are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jonsafari / habeas-corpus
View on GitHub
Command-line corpus tools
☆12May 15, 2017Updated 9 years ago
jonsafari / perstem
View on GitHub
Persian stemmer and morphological analyzer
☆19Mar 30, 2016Updated 10 years ago
jonsafari / clustercat
View on GitHub
Fast Word Clustering Software
☆79Feb 8, 2025Updated last year
jonsafari / witch-language
View on GitHub
Easy language identification of 380 languages
☆17Dec 2, 2019Updated 6 years ago
omidkashefi / Mizan
View on GitHub
MIZAN: a large persian-english parallel corpus
☆30Sep 15, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hooshvare / parsbert-ner
View on GitHub
🤗 ParsBERT Persian NER Tasks
☆18Jun 17, 2021Updated 5 years ago
wfeely / farsiNLPTools
View on GitHub
Open-source dependency parser, part-of-speech tagger, and text normalizer for Farsi (Persian)
☆44Jun 4, 2014Updated 12 years ago
UKPLab / emnlp2017-graphdocexplore
View on GitHub
Accompanying code for our EMNLP 2017 publication "GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Ex…
☆27May 27, 2023Updated 3 years ago
sixhobbits / yelp-dataset-2017
View on GitHub
Submission to the Yelp Dataset Challenge 2017
☆15Jun 30, 2017Updated 9 years ago
UniversalDependencies / UD_Persian-PerDT
View on GitHub
a conversion of Dadegan corpus (first Persian dependency corpus) to the universal dependency version
☆14May 6, 2026Updated 2 months ago
miras-tech / MirasText
View on GitHub
MirasText
☆76Aug 12, 2020Updated 5 years ago
averms / pandoc-filters
View on GitHub
A small, useful collection of pandoc filters
☆13Apr 5, 2025Updated last year
htaghizadeh / PersianStemmingDataset
View on GitHub
Persian Stemming data-set in order to evaluate new stemmers
☆14Dec 16, 2016Updated 9 years ago
reedu-reengineering-education / smart-city-dashboard
View on GitHub
☆10Nov 24, 2022Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
tetsuok / arowpp
View on GitHub
AROW++ An implementation of the efficient confidence-weighted classifier
☆11Jan 9, 2021Updated 5 years ago
proycon / colibri-core
View on GitHub
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…
☆131Feb 5, 2026Updated 5 months ago
public-people / scrape-news
View on GitHub
Scrape South African news
☆13May 22, 2023Updated 3 years ago
hooshvare / parsner
View on GitHub
Pre-Trained NER models for Persian 🦁
☆23May 28, 2021Updated 5 years ago
christianscheible / qsample
View on GitHub
A natural language processing tool for automatically detecting quotations in text.
☆15Feb 26, 2022Updated 4 years ago
ketranm / fan_vs_rnn
View on GitHub
The Importance of Being Recurrent for Modeling Hierarchical Structure
☆25Jun 27, 2018Updated 8 years ago
beta-decay / Sumerian
View on GitHub
A programming language written in the ancient language Sumerian (𒅴 𒆰)
☆14Aug 8, 2018Updated 7 years ago
tachi-hi / tts_samples
View on GitHub
Demo page of our paper Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks With Guided Attention, ICASSP 201…
☆15May 30, 2021Updated 5 years ago
duyvuleo / Transformer-DyNet
View on GitHub
An Implementation of Transformer (Attention Is All You Need) in DyNet
☆64Nov 30, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
proycon / spacy2folia
View on GitHub
Use spaCy for NLP and output to the FoLiA XML format.
☆12Feb 27, 2024Updated 2 years ago
knguyenanhoa / cli-arxiv
View on GitHub
CLI tool for exploring arXiv (inspired by karpathy's brilliant ArXiv Sanity Preserver)
☆39May 8, 2025Updated last year
arne-cl / nltk-maxent-pos-tagger
View on GitHub
maximum entropy based part-of-speech tagger for NLTK
☆45Dec 8, 2016Updated 9 years ago
wikilinks / conll03_nel_eval
View on GitHub
Python evaluation scripts for AIDA-formatted CoNLL data
☆20Aug 4, 2014Updated 11 years ago
WNortier / ts-paginator
View on GitHub
ts-paginator is a TypeScript pagination hook for React or NextJS
☆17Jun 5, 2025Updated last year
bidi-tex / xepersian
View on GitHub
Persian for LaTeX, using XeTeX
☆11May 13, 2020Updated 6 years ago
cod3licious / textcatvis
View on GitHub
tools to analyze a collection of texts and identify relevant words
☆12May 20, 2018Updated 8 years ago
Foroozani / BigData_PySpark
View on GitHub
Handle Big Data for Machine Learning using Python and PySpark, Building ETL Pipelines with PySpark, MongoDB, and Bokeh
☆10Nov 12, 2021Updated 4 years ago
wittawatj / jtcc
View on GitHub
Java library to tokenize Thai text into a list of TCCs
☆21May 30, 2017Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
koendeschacht / brown-cluster
View on GitHub
Java implementation of the brown clustering algorithm that clusters words based on their contexts in a text corpus.
☆11Mar 2, 2018Updated 8 years ago
gini / gini-vision-lib-android
View on GitHub
Android library providing components for capturing, reviewing and analyzing photos of invoices and remittance slips.
☆11Jun 7, 2022Updated 4 years ago
libmir / mir-lapack
View on GitHub
NDSLICE wrapper for LAPACK
☆12Dec 19, 2023Updated 2 years ago
facebookresearch / analyzing-uncertainty-nmt
View on GitHub
Analyzing Uncertainty in Neural Machine Translation
☆36Sep 15, 2021Updated 4 years ago
sacmehta / PRU
View on GitHub
Pyramidal Recurrent Units (PRUs): A New LSTM Unit
☆10Aug 29, 2018Updated 7 years ago
ltgoslo / norec_fine
View on GitHub
Fine-grained sentiment annotations of NoReC
☆20Aug 1, 2022Updated 3 years ago
masoudpz / AVID-Adversarial-Visual-Irregularity-Detection
View on GitHub
AVID: Adversarial Visual Irregularity Detection
☆12Oct 27, 2024Updated last year