Nicolas-BZRD / EuroBERT
β48Updated this week
Alternatives and similar repositories for EuroBERT:
Users that are interested in EuroBERT are comparing it to the libraries listed below
- Pre-train Static Word Embeddingsβ51Updated 3 weeks ago
- β38Updated last month
- NLP with Rust for Python π¦πβ61Updated 9 months ago
- β43Updated last month
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β126Updated 3 months ago
- Generalist and Lightweight Model for Text Classificationβ92Updated this week
- Python API for https://vespa.ai, the open big data serving engineβ115Updated this week
- β47Updated last year
- β85Updated 3 months ago
- An introduction to LLM Samplingβ77Updated 3 months ago
- Notebooks for training universal 0-shot classifiers on many different tasksβ120Updated 3 months ago
- β120Updated 5 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)β97Updated last year
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Trainingβ62Updated last month
- β15Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ59Updated last year
- C++ inference engine for running GLiNER (Generalist and Lightweight Named Entity Recognition) modelsβ25Updated 3 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β34Updated 3 months ago
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrievalβ49Updated 9 months ago
- Code for Zero-Shot Tokenizer Transferβ125Updated 2 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/β¦β26Updated 11 months ago
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated last year
- A fast implementation of T5/UL2 in PyTorch using Flash Attentionβ96Updated last week
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ190Updated 5 months ago
- code for training & evaluating Contextual Document Embedding modelsβ176Updated 2 months ago
- Analysis on the cost of encoder based modelsβ11Updated last month
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β47Updated last year
- Efficient few-shot learning with cross-encoders.β50Updated last year
- Using short models to classify long textsβ21Updated 2 years ago
- Vector Database with support for late interaction and token level embeddings.β53Updated 5 months ago