andersonbcdefg / dpo-lora
direct preference optimization with only 1 model copy :)
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for dpo-lora
- Score LLM pretraining data with classifiers☆54Updated last year
- ☆57Updated 11 months ago
- Public Inflection Benchmarks☆69Updated 8 months ago
- ☆55Updated last month
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- Simplex Random Feature attention, in PyTorch☆71Updated last year
- ☆62Updated last month
- ☆49Updated 6 months ago
- Small, simple agent task environments for training and evaluation☆16Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- ☆22Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated 3 weeks ago
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆21Updated last month
- Genetics for Language Models☆12Updated 4 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- ☆25Updated 2 months ago
- ☆36Updated 3 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Updated 8 months ago
- A synthetic story narration dataset to study small audio LMs.☆30Updated 10 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆21Updated 5 months ago
- train with kittens!☆49Updated 3 weeks ago
- ☆41Updated 2 weeks ago
- This is the official repository for all the code of TheoremLlama☆32Updated last month
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- ☆48Updated last year
- A repository of projects and datasets under active development by Alignment Lab AI☆22Updated 10 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 7 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆87Updated 3 months ago
- Lightweight tools for quick and easy LLM demo's☆26Updated last month
- Can Language Models Solve Olympiad Programming?☆101Updated 3 months ago