ICU based universal language tokenizer
☆34Jan 19, 2022Updated 4 years ago
Alternatives and similar repositories for ICU-tokenizer
Users that are interested in ICU-tokenizer are comparing it to the libraries listed below
Sorting:
- The implementation of CL-ReLKT (NAACL-2022)☆14Aug 31, 2022Updated 3 years ago
- c++ mosestokenizer☆18Mar 13, 2024Updated last year
- Extensible DL-based automatic Arabic diacritization tool allowing the restoration of different types of diacritics.☆21Jul 25, 2023Updated 2 years ago
- Multilingual Open Text☆25May 8, 2025Updated 9 months ago
- Tensorflow implementation of RankGan (Adversarial Ranking for Language Generation)☆22Jun 15, 2018Updated 7 years ago
- OpusFilter - Parallel corpus processing toolkit☆115Feb 11, 2026Updated 3 weeks ago
- Bilingual (or Multilingual) Large Language models and In-context Learning- The key to human parity on machine translations☆32Feb 15, 2023Updated 3 years ago
- Reader Translator Generator - NMT toolkit based on pytorch☆32Sep 12, 2023Updated 2 years ago
- Minangkabau NLP corpus. PACLIC 2020☆10Jun 7, 2021Updated 4 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆41Apr 5, 2022Updated 3 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- Code for our ACL2021 paper Neural Machine Translation with Monolingual Translation Memory☆82Jun 12, 2023Updated 2 years ago
- ☆10Oct 31, 2019Updated 6 years ago
- A clean beamer/ltx-talk theme with a big title graphic☆20Feb 16, 2026Updated 2 weeks ago
- This repo contains all the cheatsheets that I found Important.☆10Oct 27, 2020Updated 5 years ago
- Scrape Youtube for videos and extract screenshots from the videos☆12Feb 12, 2021Updated 5 years ago
- GENOT: Generative Neural Optimal Transport☆15Dec 18, 2024Updated last year
- This is my 2024 course for TAP Institute on Vector Databases and Semantic Searching.☆12Jul 26, 2024Updated last year
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 3 months ago
- A github action to setup a small SLURM cluster for testing purposes.☆14Jul 20, 2025Updated 7 months ago
- Thai word segmentation using deep learning☆14Jul 1, 2019Updated 6 years ago
- Operationele Prioritaire Stoffen model☆14Mar 25, 2025Updated 11 months ago
- Optimized inference with Ascend and Hugging Face☆12Apr 23, 2024Updated last year
- A monolithic index that supports worst-case optimal joins (WCOJ) by providing all collation orders in a single redundancy eliminating dat…☆16Sep 18, 2025Updated 5 months ago
- An abstract, safe, and concise color conversion library for rust nightly This requires the feature adt_const_params☆12Nov 18, 2022Updated 3 years ago
- In this project, you'll train a convolutional neural network to classify and recognize different categories of fonts. We'll be using the …☆13Feb 29, 2020Updated 6 years ago
- ACM UMAP2020 Hands-on Tutorial on Data and Algorithmic Bias in Recommender Systems☆10May 23, 2021Updated 4 years ago
- Easy & Pretrained SOTA Deep Learning for RNA strings☆12Apr 15, 2022Updated 3 years ago
- ☆10Nov 28, 2020Updated 5 years ago
- Tunisian Arabish Corpus☆11Mar 12, 2024Updated last year
- Precise type-checker for JavaScript☆11Oct 23, 2025Updated 4 months ago
- MongoDB with Pymongo Tutorial☆10Apr 19, 2024Updated last year
- ☆10Sep 27, 2021Updated 4 years ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆19Mar 2, 2025Updated last year
- ☆12Oct 2, 2024Updated last year
- An offline TTS engine for AkulAI and more.☆13Aug 21, 2024Updated last year
- Jupyter server proxy for OpenRefine☆10Oct 18, 2024Updated last year
- MIDict (Multi-Index Dict) can be indexed by any "keys" or "values", suitable as a bidirectional/inverse dict or a multi-key/multi-value d…☆14May 19, 2016Updated 9 years ago
- Data Catalog Project☆11Dec 23, 2024Updated last year