schauppi / Self-Rewarding-Language-Models
☆44Updated 10 months ago
Alternatives and similar repositories for Self-Rewarding-Language-Models:
Users that are interested in Self-Rewarding-Language-Models are comparing it to the libraries listed below
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- ☆32Updated 9 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆75Updated last year
- ☆24Updated 6 months ago
- ☆48Updated 4 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- ☆27Updated this week
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆73Updated 9 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆75Updated 2 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆49Updated 2 weeks ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆16Updated last month
- ☆96Updated 8 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- Implementation of "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models"☆26Updated last month
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆44Updated 3 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 6 months ago
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆39Updated 7 months ago
- ☆119Updated 5 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆70Updated 4 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆136Updated 4 months ago
- Critique-out-Loud Reward Models☆55Updated 5 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆71Updated 7 months ago