uhermjakob/utoken

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/uhermjakob/utoken)

uhermjakob / utoken

universal tokenizer

☆17

Alternatives and similar repositories for utoken

Users that are interested in utoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BibleNLP / ebible
View on GitHub
Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.
☆98May 23, 2025Updated last year
timjogorman / Multisentence-AMR-guidelines
View on GitHub
Guidelines for our secondary layer of annotation adding multi-sentence AMR links
☆12Sep 6, 2017Updated 8 years ago
coastalcph / supersense-data-twitter
View on GitHub
Tweets annotated with coarse-grained sense labels (supersenses)
☆13Jun 13, 2014Updated 12 years ago
ablodge / leamr
View on GitHub
A structurally comprehensive dataset of AMR-to-text alignments for coverage of a larger variety of linguistic phenomena, for research rel…
☆16Dec 10, 2022Updated 3 years ago
friendsofagape / autographa
View on GitHub
A Bible translation editor for everyone.
☆23Jul 19, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
BibleNLP / awesome-bible-nlp
View on GitHub
A curated list of resources dedicated to Biblical Natural Language Processing
☆32Sep 5, 2025Updated 10 months ago
isi-nlp / uroman
View on GitHub
Universal Romanizer that can convert any unicode script to roman (latin) script
☆250Jul 26, 2024Updated 2 years ago
tnq177 / witwicky
View on GitHub
Witwicky: An implementation of Transformer in PyTorch.
☆22Aug 17, 2020Updated 5 years ago
google-research-datasets / TF-IDF-IIF-top100-wordlists
View on GitHub
These are lists for a variety of languages containing words that are distinctive to each language.
☆42Apr 5, 2022Updated 4 years ago
thammegowda / tika-ner-corenlp
View on GitHub
Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser
☆13Feb 26, 2022Updated 4 years ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆167Apr 13, 2026Updated 3 months ago
uhermjakob / wildebeest
View on GitHub
Scripts investigate, repair and normalize a wide range of text file problems at the character level.
☆23May 25, 2026Updated 2 months ago
sillsdev / silnlp
View on GitHub
A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
☆37Updated this week
crockpotveggies / dl4j-examples
View on GitHub
Deeplearning4j Examples (DL4J, DL4J Spark, DataVec)
☆10Aug 16, 2018Updated 7 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
bibledit / old
View on GitHub
Older code for Bibledit
☆10May 9, 2017Updated 9 years ago
kaxap / candle-go
View on GitHub
Use HuggingFace's Candle with Go.
☆17Aug 10, 2023Updated 2 years ago
noa / iur
View on GitHub
Official repository for the EMNLP 2019 paper, "Learning Invariant Representations of Social Media Users."
☆12Aug 27, 2021Updated 4 years ago
marian-nmt / sotastream
View on GitHub
A library for data streaming and augmentation
☆22May 5, 2025Updated last year
icsi-berkeley / ecg_framenet
View on GitHub
Package for reading in FrameNet data and performing operations on it, such as creating ECG grammars.
☆30Mar 6, 2020Updated 6 years ago
Sushegaad / Semantic-Privacy-Guard
View on GitHub
Semantic Privacy Guard: A Java middleware that intercepts text, identifies PII using a three-layer hybrid pipeline (Regex + Naive Bayes M…
☆16Jun 14, 2026Updated last month
educastellano / qr.js
View on GitHub
qr.js: QR code generator in pure Javascript (2011)
☆13Jul 23, 2024Updated 2 years ago
Evs91 / HealthkitImportInfluxDB
View on GitHub
Script to process your Healthkit "export.xml" to InfluxDB.
☆16Mar 2, 2018Updated 8 years ago
lcnittl / DMFO
View on GitHub
Diff and Merge for Office
☆18Jul 13, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
j-luo93 / MorphForest
View on GitHub
Code for Unsupervised Learning of Morphological Forest
☆14Aug 12, 2019Updated 6 years ago
MGheini / xattn-transfer-for-mt
View on GitHub
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…
☆33Sep 15, 2021Updated 4 years ago
kirankotari / xmlmerge
View on GitHub
Simple command line XML Merge tool
☆13Aug 9, 2024Updated last year
RobAntunes / lingodb
View on GitHub
The SQLite of Semantic Search
☆30Sep 25, 2025Updated 10 months ago
ShenggaoZhu / midict
View on GitHub
MIDict (Multi-Index Dict) can be indexed by any "keys" or "values", suitable as a bidirectional/inverse dict or a multi-key/multi-value d…
☆14May 19, 2016Updated 10 years ago
eugen1j / aioscrapy
View on GitHub
Python asynchronous library for web scrapping
☆12Aug 24, 2021Updated 4 years ago
RedGhoul / FiberStarter
View on GitHub
[GO - Fiber] Fiber Starter Project - Session Based Auth - Server Side Rendering
☆13Apr 4, 2026Updated 3 months ago
alvations / myth
View on GitHub
Myanmar and Thai Language Resources
☆10Jul 18, 2022Updated 4 years ago
fajri91 / minangNLP
View on GitHub
Minangkabau NLP corpus. PACLIC 2020
☆11Jun 7, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ahhhh6980 / colortypes
View on GitHub
An abstract, safe, and concise color conversion library for rust nightly This requires the feature adt_const_params
☆12Nov 18, 2022Updated 3 years ago
interscript / rababa
View on GitHub
Rababa, the diacritization library for Arabic and Hebrew (Abjad scripts in general)
☆13May 1, 2025Updated last year
eligugliotta / tarc
View on GitHub
Tunisian Arabish Corpus
☆12Mar 12, 2024Updated 2 years ago
mt-upc / SHAS
View on GitHub
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
☆44Feb 9, 2023Updated 3 years ago
bible-technology / scribe-scripture-editor
View on GitHub
A Bible translation editor for everyone.
☆24Jul 2, 2026Updated 3 weeks ago
ebisu-flashcards / flashcards-cli
View on GitHub
Simple CLI frontend for flashcards-core
☆12Jul 30, 2021Updated 4 years ago
syhpoon / nsga
View on GitHub
Implementation of the multi-objective genetic optimization algorithm NSGA-II
☆12Jun 22, 2025Updated last year