andersonbcdefg / dpo-loraLinks
direct preference optimization with only 1 model copy :)
☆14Updated last year
Alternatives and similar repositories for dpo-lora
Users that are interested in dpo-lora are comparing it to the libraries listed below
Sorting:
- ☆68Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 7 months ago
- A 7B parameter model for mathematical reasoning☆40Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 10 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆191Updated last year
- Public Inflection Benchmarks☆68Updated last year
- ☆102Updated 11 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆223Updated last year
- ☆121Updated 6 months ago
- Open source interpretability artefacts for R1.☆158Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆120Updated last year
- ☆98Updated 4 months ago
- ☆133Updated 5 months ago
- Evaluating LLMs with fewer examples☆160Updated last year
- ☆139Updated last week
- ☆14Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 6 months ago
- Simplex Random Feature attention, in PyTorch☆74Updated last year
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆39Updated 2 weeks ago
- ☆89Updated 7 months ago
- look how they massacred my boy☆64Updated 10 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆74Updated 5 months ago
- Data preparation code for Amber 7B LLM☆91Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆229Updated last month
- Experiments for efforts to train a new and improved t5☆76Updated last year
- ☆13Updated 4 months ago
- ☆76Updated 2 months ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆45Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆179Updated 5 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆74Updated 3 months ago