Watchfulio / dataset-generatorLinks

A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of the cost of prompting LLMs directly.

☆22

Alternatives and similar repositories for dataset-generator

Users that are interested in dataset-generator are comparing it to the libraries listed below

Sorting:

official-elinas / zeus-llm-trainer
Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models
☆69Updated last year
TheDuckAI / arb
Advanced Reasoning Benchmark Dataset for LLMs
☆47Updated last year
choosewhatulike / case2code
☆15Updated 3 months ago
xingyaoww / LeTI
Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."
☆65Updated 2 years ago
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
qrdlgit / graph-of-thoughts
Based on the tree of thoughts paper
☆48Updated last year
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated 10 months ago
arcee-ai / DAM
☆52Updated 8 months ago
CarperAI / treasure_trove
☆22Updated last year
likenneth / persona_drift
Measuring and Controlling Persona Drift in Language Model Dialogs
☆17Updated last year
deployradiant / pychatml
Chat Markup Language conversation library
☆55Updated last year
CERC-AAI / Robin
☆63Updated 9 months ago
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated 2 years ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆42Updated 2 months ago
geronimi73 / phi2-finetune
☆87Updated last year
Zyphra / Zyda_processing
☆35Updated last year
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated 2 months ago
pacman100 / peft-codegen-25
☆23Updated 2 years ago
alexrs / herd
Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.
☆12Updated last year
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆33Updated last year
tanyuqian / cappy
NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
☆43Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
Zyphra / zcookbook
Training hybrid models for dummies.
☆25Updated 6 months ago
kyegomez / LOGICGUIDE
Plug in and Play implementation of "Certified Reasoning with Language Models" that elevates model reasoning by 40%
☆17Updated 2 years ago
leloykun / mmsg
Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.
☆28Updated 8 months ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆71Updated last year
csinva / iprompt
Finding semantically meaningful and accurate prompts.
☆47Updated last year
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 5 months ago
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated last year