☆107Jun 2, 2025Updated 9 months ago
Alternatives and similar repositories for NeoBERT
Users that are interested in NeoBERT are comparing it to the libraries listed below
Sorting:
- LTG-Bert☆34Jan 8, 2024Updated 2 years ago
- State-of-the-art paired encoder and decoder models (17M-1B params)☆59Aug 6, 2025Updated 6 months ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 3 months ago
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month
- Official code release for "SuperBPE: Space Travel for Language Models"☆89Jan 9, 2026Updated last month
- Official implementation of "GPT or BERT: why not both?"☆62Jul 28, 2025Updated 7 months ago
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆44Mar 6, 2024Updated last year
- A extension of Transformers library to include T5ForSequenceClassification class.☆40Apr 17, 2023Updated 2 years ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…☆25Jul 27, 2024Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆27Nov 30, 2024Updated last year
- Datamodels for hugging face tokenizers☆99Updated this week
- A context-aware embedding similarity score☆11Aug 23, 2023Updated 2 years ago
- decontamination☆26Dec 3, 2025Updated 3 months ago
- ☆57Jan 26, 2025Updated last year
- ☆10Oct 15, 2019Updated 6 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Dec 24, 2022Updated 3 years ago
- Bringing BERT into modernity via both architecture changes and scaling☆1,633Jun 30, 2025Updated 8 months ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 2 months ago
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Oct 9, 2023Updated 2 years ago
- Delayed Evaluation With tidyverse Verbs☆16Sep 3, 2023Updated 2 years ago
- This repository contains the training and evaluation code for llm-jp-modernbert-base.☆15Jun 17, 2025Updated 8 months ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- ☆11Apr 25, 2021Updated 4 years ago
- ☆12Dec 6, 2024Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Aug 5, 2023Updated 2 years ago
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 9 months ago
- ANE accelerated embedding models!☆20Dec 11, 2024Updated last year
- Common Voice Generator using Speech Synthesizer☆13Jul 28, 2021Updated 4 years ago
- SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆14Mar 24, 2021Updated 4 years ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.☆15Sep 3, 2024Updated last year
- LLM application tracing based on OpenTelemetry☆16Nov 24, 2025Updated 3 months ago
- The training codes of Jasper-Token-Compression-600M☆19Nov 19, 2025Updated 3 months ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆33Sep 20, 2025Updated 5 months ago
- Efficient few-shot learning with cross-encoders.☆63Feb 16, 2024Updated 2 years ago
- A massively multilingual modern encoder language model☆131Jan 20, 2026Updated last month
- EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling☆34Nov 21, 2021Updated 4 years ago
- A repo for code based language models☆18Feb 10, 2021Updated 5 years ago
- The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGI…☆16May 4, 2022Updated 3 years ago