embeddings-benchmark / arenaLinks

Code for the MTEB Arena

☆23

Alternatives and similar repositories for arena

Users that are interested in arena are comparing it to the libraries listed below

Sorting:

jxmorris12 / bm25_pt
minimal pytorch implementation of bm25 (with sparse tensors)
☆104Updated last year
bminixhofer / tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆46Updated 3 months ago
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆197Updated 5 months ago
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆273Updated last year
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆138Updated 9 months ago
sileod / tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
☆188Updated 3 months ago
AnswerDotAI / fastkmeans
☆77Updated 3 months ago
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆66Updated 3 weeks ago
princeton-nlp / LitSearch
[EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search
☆98Updated 10 months ago
Hannibal046 / nanoColBERT
Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).
☆80Updated last year
mungg / FABLES
☆57Updated last year
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆216Updated 2 months ago
davanstrien / haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
☆66Updated last year
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 10 months ago
malteos / llm-datasets
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
☆61Updated last year
allenai / infinigram-api
☆80Updated this week
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆49Updated 8 months ago
MinishLab / tokenlearn
Pre-train Static Word Embeddings
☆87Updated last month
google-deepmind / mishax
☆142Updated last month
pchizhov / picky_bpe
BPE modification that implements removing of the intermediate tokens during tokenizer training.
☆25Updated 10 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated last year
jakespringer / echo-embeddings
☆155Updated last year
sail-sg / sailcraft
🚢 Data Toolkit for Sailor Language Models
☆94Updated 7 months ago
QuixiAI / spectrum
☆136Updated last month
liujch1998 / infini-gram
☆72Updated 2 months ago
LAGoM-NLP / transtokenizer
☆52Updated 8 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
leap-laboratories / PIZZA
An attribution library for LLMs
☆43Updated last year
huggingface / fineweb-2
☆196Updated 3 months ago
schen149 / sub-sentence-encoder
The official code repo for "Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations".
☆83Updated last year