nomic-ai / contrastorsLinks

Train Models Contrastively in Pytorch

☆731

Alternatives and similar repositories for contrastors

Users that are interested in contrastors are comparing it to the libraries listed below

Sorting:

ContextualAI / gritlm
Generative Representational Instruction Tuning
☆664Updated last month
huggingface / cosmopedia
☆529Updated 8 months ago
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆702Updated last year
allenai / OLMo-Eval
Evaluation suite for LLMs
☆356Updated 3 weeks ago
castorini / rank_llm
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆508Updated this week
huggingface / text-clustering
Easily embed, cluster and semantically label text datasets
☆560Updated last year
microsoft / MS-MARCO-Web-Search
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
☆340Updated 7 months ago
BatsResearch / bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
☆783Updated 3 weeks ago
arcee-ai / DistillKit
An Open Source Toolkit For LLM Distillation
☆698Updated 3 weeks ago
xfactlab / orpo
Official repository for ORPO
☆461Updated last year
mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆649Updated last year
arielnlee / Platypus
Code for fine-tuning Platypus fam LLMs using LoRA
☆628Updated last year
datadreamer-dev / DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
☆1,041Updated 6 months ago
allenai / dolma
Data and tools for generating and inspecting OLMo pre-training data.
☆1,283Updated this week
datamllab / LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
☆660Updated last year
xhluca / bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
☆1,267Updated 2 months ago
mistralai-sf24 / hackathon
☆447Updated last year
AnswerDotAI / ModernBERT
Bringing BERT into modernity via both architecture changes and scaling
☆1,469Updated last month
SeanLee97 / AnglE
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
☆550Updated 4 months ago
jondurbin / bagel
A bagel, with everything.
☆323Updated last year
mlfoundations / open_lm
A repository for research on medium sized language models.
☆506Updated 2 months ago
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,553Updated last year
apoorvumang / prompt-lookup-decoding
☆556Updated 11 months ago
davanstrien / awesome-synthetic-datasets
awesome synthetic (text) datasets
☆291Updated 3 weeks ago
magpie-align / magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆744Updated 4 months ago
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆978Updated 3 months ago
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆739Updated 10 months ago
RAIVNLab / MRL
Code repository for the paper - "Matryoshka Representation Learning"
☆534Updated last year
google-deepmind / long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
☆627Updated 3 weeks ago
huggingface / lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆1,793Updated this week