piskvorky/gensim-data

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/piskvorky/gensim-data)

piskvorky / gensim-data

Data repository for pretrained NLP models and NLP corpora.

☆1,058

Alternatives and similar repositories for gensim-data

Users that are interested in gensim-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

piskvorky / gensim
View on GitHub
Topic Modelling for Humans
☆16,464Nov 1, 2025Updated 8 months ago
stanfordnlp / GloVe
View on GitHub
Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
☆7,226Jul 27, 2025Updated 11 months ago
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,549Mar 22, 2024Updated 2 years ago
3Top / word2vec-api
View on GitHub
Simple web service providing a word embedding model
☆1,436May 1, 2023Updated 3 years ago
allenai / allennlp
View on GitHub
An open-source NLP research library, built on PyTorch.
☆11,889Nov 22, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
niderhoff / nlp-datasets
View on GitHub
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
☆5,990Feb 15, 2023Updated 3 years ago
commonsense / conceptnet-numberbatch
View on GitHub
☆1,322Jul 18, 2022Updated 4 years ago
facebookresearch / InferSent
View on GitHub
InferSent sentence embeddings
☆2,280Aug 30, 2021Updated 4 years ago
sebastianruder / NLP-progress
View on GitHub
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the mo…
☆22,957Jul 28, 2024Updated last year
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,918Updated this week
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,382Oct 27, 2025Updated 8 months ago
google-research / bert
View on GitHub
TensorFlow code and pre-trained models for BERT
☆40,058Jul 23, 2024Updated last year
chartbeat-labs / textacy
View on GitHub
NLP, before and after spaCy
☆2,239Sep 22, 2023Updated 2 years ago
PetrochukM / PyTorch-NLP
View on GitHub
Basic Utilities for PyTorch Natural Language Processing (NLP)
☆2,224Jul 4, 2023Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
facebookresearch / SentEval
View on GitHub
A python tool for evaluating the quality of sentence embeddings.
☆2,110Mar 19, 2024Updated 2 years ago
dmlc / gluon-nlp
View on GitHub
NLP made easy
☆2,544Oct 6, 2023Updated 2 years ago
piskvorky / bounter
View on GitHub
Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.
☆931Nov 20, 2022Updated 3 years ago
Hironsan / awesome-embedding-models
View on GitHub
A curated list of awesome embedding models tutorials, projects and communities.
☆1,843Apr 7, 2019Updated 7 years ago
huggingface / neuralcoref
View on GitHub
✨Fast Coreference Resolution in spaCy with Neural Networks
☆2,892Apr 13, 2023Updated 3 years ago
src-d / wmd-relax
View on GitHub
Calculates Word Mover's Distance Insanely Fast
☆458Aug 17, 2023Updated 2 years ago
RaRe-Technologies / topic_eval
View on GitHub
Tools and services for evaluating topic models
☆15Apr 12, 2016Updated 10 years ago
explosion / sense2vec
View on GitHub
🦆 Contextually-keyed word vectors
☆1,678Mar 27, 2026Updated 3 months ago
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,752May 19, 2026Updated 2 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
facebookresearch / pytext
View on GitHub
A natural language modeling framework based on PyTorch
☆6,296Oct 17, 2022Updated 3 years ago
sb1992 / NETL-Automatic-Topic-Labelling-
View on GitHub
Generating labels for topics automatically using neural embeddings
☆186Sep 10, 2025Updated 10 months ago
pytorch / text
View on GitHub
Models, data loaders and abstractions for language processing, powered by PyTorch
☆3,559Sep 10, 2025Updated 10 months ago
oborchers / Fast_Sentence_Embeddings
View on GitHub
Compute Sentence Embeddings Fast!
☆624Mar 2, 2023Updated 3 years ago
IntelLabs / nlp-architect
View on GitHub
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural …
☆2,933Nov 7, 2022Updated 3 years ago
jina-ai / clip-as-service
View on GitHub
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
☆12,828Jan 23, 2024Updated 2 years ago
facebookresearch / StarSpace
View on GitHub
Learning embeddings for classification, retrieval and ranking.
☆3,954Dec 4, 2022Updated 3 years ago
brmson / dataset-sts
View on GitHub
Semantic Text Similarity Dataset Hub
☆730May 19, 2018Updated 8 years ago
bheinzerling / bpemb
View on GitHub
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
☆1,222Oct 1, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookresearch / MUSE
View on GitHub
A library for Multilingual Unsupervised or Supervised word Embeddings
☆3,248Aug 31, 2022Updated 3 years ago
DerwenAI / pytextrank
View on GitHub
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
☆2,217Jun 24, 2026Updated 3 weeks ago
huggingface / transformers
View on GitHub
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal model…
☆162,712Updated this week
NIHOPA / NLPre
View on GitHub
Python library for Natural Language Preprocessing (NLPre)
☆190Jul 31, 2023Updated 2 years ago
ddangelov / Top2Vec
View on GitHub
Top2Vec learns jointly embedded topic, document and word vectors.
☆3,104Nov 14, 2024Updated last year
makcedward / nlpaug
View on GitHub
Data augmentation for NLP
☆4,662Updated this week
zihangdai / xlnet
View on GitHub
XLNet: Generalized Autoregressive Pretraining for Language Understanding
☆6,180May 28, 2023Updated 3 years ago