yihedeng9 / rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
โ89Updated last month
Alternatives and similar repositories for rlhf-summary-notes:
Users that are interested in rlhf-summary-notes are comparing it to the libraries listed below
- [NeurIPS'24 Spotlight] Observational Scaling Lawsโ49Updated 3 months ago
- โ93Updated 6 months ago
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ88Updated last month
- โ125Updated last month
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"โ90Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]โ130Updated 3 months ago
- โ58Updated 8 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionโ111Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.โ88Updated 2 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ145Updated last month
- ๐พ OAT: Online AlignmenT for LLMsโ81Updated 3 weeks ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ102Updated 6 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"โ94Updated 10 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityโ170Updated 5 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewardsโ41Updated 5 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyโ44Updated last month
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".โ42Updated last month
- โ69Updated this week
- The official repository of the Omni-MATH benchmark.โ66Updated 3 weeks ago
- โ78Updated 10 months ago
- โ85Updated last year
- A curated list of awesome resources dedicated to Scaling Laws for LLMsโ69Updated last year
- [NeurIPS-2024] ๐ Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623โ75Updated 3 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"โ71Updated 7 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examplesโ57Updated 2 weeks ago
- Language models scale reliably with over-training and on downstream tasksโ96Updated 9 months ago
- โ56Updated 4 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large โฆโ64Updated 3 weeks ago
- Directional Preference Alignmentโ54Updated 3 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ145Updated 8 months ago