datadreamer-dev / DataDreamerLinks
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€
β1,057Updated 7 months ago
Alternatives and similar repositories for DataDreamer
Users that are interested in DataDreamer are comparing it to the libraries listed below
Sorting:
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.β790Updated 2 months ago
- Evaluate your LLM's response with Prometheus and GPT4 π―β981Updated 4 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,878Updated last week
- Easily embed, cluster and semantically label text datasetsβ567Updated last year
- Automatically evaluate your LLMs in Google Colabβ658Updated last year
- Train Models Contrastively in Pytorchβ745Updated 5 months ago
- β538Updated 9 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,528Updated 3 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,890Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,616Updated 3 weeks ago
- Best practices for distilling large language models.β574Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,309Updated last week
- Generative Representational Instruction Tuningβ671Updated 2 months ago
- Synthetic data curation for post-training and structured data extractionβ1,495Updated last month
- Stanford NLP Python library for Representation Finetuning (ReFT)β1,512Updated 7 months ago
- Data and tools for generating and inspecting OLMo pre-training data.β1,311Updated this week
- A reading list on LLM based Synthetic Data Generation π₯β1,407Updated 3 months ago
- β1,034Updated 8 months ago
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'β1,585Updated 7 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.β532Updated last week
- Automated Evaluation of RAG Systemsβ654Updated 5 months ago
- Bringing BERT into modernity via both architecture changes and scalingβ1,516Updated 2 months ago
- β1,077Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ2,004Updated last year
- awesome synthetic (text) datasetsβ296Updated 2 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuningβ661Updated last year
- An Open Source Toolkit For LLM Distillationβ724Updated 2 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β434Updated last year
- Fast Semantic Text Deduplication & Filteringβ803Updated last week
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAIβ1,400Updated last year