JHU-CLSP/mmBERT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JHU-CLSP/mmBERT)

JHU-CLSP / mmBERT

A massively multilingual modern encoder language model

☆145

Alternatives and similar repositories for mmBERT

Users that are interested in mmBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JHU-CLSP / ettin-encoder-vs-decoder
View on GitHub
State-of-the-art paired encoder and decoder models (17M-1B params)
☆76Aug 6, 2025Updated 11 months ago
hltcoe / rank-k
View on GitHub
Repository for the listwise reranker Rank-K
☆16May 23, 2025Updated last year
DunZhang / Jasper-Token-Compression-Training
View on GitHub
The training codes of Jasper-Token-Compression-600M
☆20Nov 19, 2025Updated 8 months ago
stefan-it / modern-bert-ner
View on GitHub
My NER Experiments with ModernBERT and Ettin
☆29Jul 17, 2025Updated last year
worldbank / GISTEmbed
View on GitHub
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings
☆45Mar 6, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
illuin-tech / contextual-embeddings
View on GitHub
Model implementation for the contextual embeddings project
☆47Jun 2, 2025Updated last year
recombee / CompresSAE
View on GitHub
Sparse Embedding Compression for Scalable Retrieval in Recommender Systems
☆39Nov 21, 2025Updated 8 months ago
stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago
DSBA-Lab / Contrastive-Accumulation
View on GitHub
☆14Jul 7, 2024Updated 2 years ago
ibm-granite / granite-embedding-models
View on GitHub
☆77May 14, 2026Updated 2 months ago
iPieter / llmq
View on GitHub
A Scheduler for Batched LLM Inference
☆19Oct 5, 2025Updated 9 months ago
knowledgeable-embedding / knowledgeable-embedding
View on GitHub
Knowledgeable Embedding: Injecting dynamically updatable entity knowledge into embeddings to enhance RAG
☆15Aug 31, 2025Updated 10 months ago
feyninc / tokie
View on GitHub
🍡 30x faster tokenization for every HuggingFace model
☆49Updated this week
hotchpotch / yasem
View on GitHub
YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings
☆13May 22, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Helw150 / levanter
View on GitHub
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆16Jun 16, 2024Updated 2 years ago
oceanumeric / EnteRAG
View on GitHub
A RAG that can scale 🧑🏻‍💻
☆11May 28, 2024Updated 2 years ago
s-smits / modernbert-finetune
View on GitHub
Fine-tune ModernBERT with custom tokenizers, curriculum learning, and next-gen optimizers.
☆74Jan 16, 2026Updated 6 months ago
roipony / flash-maxsim
View on GitHub
☆27Jun 11, 2026Updated last month
frinkleko / LIMIT-Sparse-Embedding
View on GitHub
Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoretica…
☆16Sep 4, 2025Updated 10 months ago
stephantul / skeletoken
View on GitHub
Datamodels for hugging face tokenizers
☆109Jun 18, 2026Updated last month
chandar-lab / NeoBERT
View on GitHub
☆109Jun 2, 2025Updated last year
Knowledgator / FlashDeBERTa
View on GitHub
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆90Feb 10, 2026Updated 5 months ago
gangiswag / cornstack
View on GitHub
☆56Jun 21, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆876Updated this week
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆20Apr 18, 2026Updated 3 months ago
ottowg / gsap-ner
View on GitHub
☆10Oct 2, 2024Updated last year
allenai / natural-perturbations
View on GitHub
Natural Perturbation for Robust Question Answering
☆12Apr 7, 2020Updated 6 years ago
stephantul / piecelearn
View on GitHub
Learning BPE embeddings by first learning a segmentation model and then training word2vec
☆19Dec 18, 2022Updated 3 years ago
ruyimarone / data-portraits
View on GitHub
Documenting large text datasets 🖼️ 📚
☆14Dec 17, 2024Updated last year
AnswerDotAI / ModernBERT
View on GitHub
Bringing BERT into modernity via both architecture changes and scaling
☆1,704Mar 1, 2026Updated 4 months ago
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated last year
thakur-nandan / income
View on GitHub
INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.
☆24Sep 24, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
orionw / rank1
View on GitHub
Test-time compute in information retrieval
☆59Jul 8, 2025Updated last year
LLM360 / k2v2_train
View on GitHub
Training codebase for K2-V2
☆22Dec 17, 2025Updated 7 months ago
hotchpotch / yast
View on GitHub
YAST - Yet Another SPLADE or Sparse Trainer
☆21Jun 16, 2025Updated last year
cmpnd-ai / dspy-qwen-adapter
View on GitHub
A DSPy adapter tailored to Qwen 3+ suggested formatting patterns.
☆23Apr 29, 2026Updated 2 months ago
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 8 months ago
Snowflake-Labs / arctic-embed
View on GitHub
☆89Nov 3, 2025Updated 8 months ago
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆953May 24, 2026Updated 2 months ago