google-research-datasets/TF-IDF-IIF-top100-wordlists

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/TF-IDF-IIF-top100-wordlists)

google-research-datasets / TF-IDF-IIF-top100-wordlists

These are lists for a variety of languages containing words that are distinctive to each language.

☆42

Alternatives and similar repositories for TF-IDF-IIF-top100-wordlists

Users that are interested in TF-IDF-IIF-top100-wordlists are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uhermjakob / utoken
View on GitHub
universal tokenizer
☆17Nov 29, 2021Updated 4 years ago
muhaochen / bilingual_dictionaries
View on GitHub
This repository contains the source code and links to some datasets used in the CoNLL 2019 paper "Learning to Represent Bilingual Diction…
☆12Oct 1, 2020Updated 5 years ago
Yinghao-Li / CHMM-ALT
View on GitHub
Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"
☆32Jun 20, 2023Updated 3 years ago
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
cyr19 / MENLI
View on GitHub
☆17Nov 20, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ElotlMX / py-elotl
View on GitHub
Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.
☆24Sep 4, 2025Updated 10 months ago
wooorm / trigrams
View on GitHub
Trigram files for 500+ languages
☆24Mar 21, 2025Updated last year
cidles / pyannotation
View on GitHub
PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files.
☆17Sep 4, 2012Updated 13 years ago
jeongukjae / tfds-korean
View on GitHub
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.
☆20Jun 8, 2022Updated 4 years ago
mnm-team / latex-beamer
View on GitHub
Latex Beamer Theme
☆18Apr 25, 2025Updated last year
OpenEdition / tei.openedition
View on GitHub
ARCHIVE / OpenEdition TEI Schema / MOVE TO https://gitlab.openedition.org
☆18Apr 9, 2025Updated last year
alirezamshi-zz / small100
View on GitHub
Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…
☆26Nov 4, 2022Updated 3 years ago
w3c / elreq
View on GitHub
Ethiopic Layout Requirements
☆12Mar 20, 2026Updated 4 months ago
cisnlp / GlotScript
View on GitHub
[LREC 2024] 🖋 Resource and Tool for Writing System Identification
☆22Mar 29, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
midas-research / bhaav
View on GitHub
Dataset of sentences from Hindi stories tagged with different emotion tags
☆11Nov 26, 2019Updated 6 years ago
LBeaudoux / tatoebatools
View on GitHub
A library for fetching and reading Tatoeba's weekly exports
☆24Feb 5, 2026Updated 5 months ago
sdtblck / Opensubtitles_dataset
View on GitHub
downloads and parses subtitle dataset from opensubtitles.org
☆15Apr 19, 2024Updated 2 years ago
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 3 weeks ago
bltlab / mot
View on GitHub
Multilingual Open Text
☆26May 8, 2025Updated last year
Helsinki-NLP / OPUS-translator
View on GitHub
Translation demonstrator
☆37May 12, 2020Updated 6 years ago
adityamogadala / xLiMeSemanticIntegrator
View on GitHub
More Information about Features, Deliverables and Publications @
☆11May 17, 2016Updated 10 years ago
midas-research / hindi-nli-data
View on GitHub
a repository containing the details of natural language inference dataset in Hindi
☆14Dec 28, 2020Updated 5 years ago
sillsdev / silnlp
View on GitHub
A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
☆37Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
gglobster / trappist
View on GitHub
TRAPPIST: Totally Rad Analysis Pipelines Python Informatics Super Tool (actually, a python-based genomics analysis toolbox)
☆14Jan 25, 2018Updated 8 years ago
besacier / mboshi-french-parallel-corpus
View on GitHub
☆23Apr 8, 2022Updated 4 years ago
MrBananaHuman / KoreanCharacterBert
View on GitHub
Korean BERT model using character tokenizer
☆27Apr 8, 2021Updated 5 years ago
lovit / huggingface_konlpy
View on GitHub
Training Transformers of Huggingface with KoNLPy
☆68Aug 28, 2020Updated 5 years ago
PrathamOrg / ASER-Dataset
View on GitHub
☆15May 13, 2020Updated 6 years ago
UKPLab / adaptable-adapters
View on GitHub
☆25Jul 12, 2022Updated 4 years ago
OpenGravestones / OpenGravestones
View on GitHub
A project to provide open burial data built on open standards.
☆19Oct 2, 2015Updated 10 years ago
wellecks / mgs
View on GitHub
MLE-Guided Parameter Search (AAAI 2021)
☆12Sep 16, 2021Updated 4 years ago
libris / librisxl
View on GitHub
Libris XL
☆57Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
michaelmilleryoder / fanfiction-nlp
View on GitHub
An NLP processing pipeline for characters in fanfiction. Developed by students at Carnegie Mellon University from 2019-2021.
☆38Feb 2, 2026Updated 5 months ago
rewire-online / edos
View on GitHub
Public repository for SemEval 2023 - Task 10 - Explainable Detection of Online Sexism (EDOS)
☆26Apr 13, 2023Updated 3 years ago
nyu-dl / dl4mt-multi-src
View on GitHub
☆19Mar 15, 2017Updated 9 years ago
baoy-nlp / DSS-VAE-pytorch
View on GitHub
Generating Sentences from Disentangled Syntactic and Semantic Spaces
☆11Jun 24, 2019Updated 7 years ago
jungokasai / beam_with_patience
View on GitHub
☆46Apr 13, 2022Updated 4 years ago
jerinphilip / ilmulti
View on GitHub
Tooling to play around with multilingual machine translation for Indian Languages.
☆22Mar 5, 2022Updated 4 years ago
w3c / afrlreq
View on GitHub
African language enablement for the Web
☆11Mar 19, 2026Updated 4 months ago