tianjunz / HIR
☆158Updated last year
Related projects ⓘ
Alternatives and complementary repositories for HIR
- Chain-of-Hindsight, A Scalable RLHF Method☆220Updated last year
- ☆94Updated last year
- A repository for transformer critique learning and generation☆86Updated 11 months ago
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆177Updated 9 months ago
- RLHF implementation details of OAI's 2019 codebase☆152Updated 10 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆155Updated 6 months ago
- ☆114Updated 4 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆104Updated 5 months ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆199Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆201Updated last year
- Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".☆145Updated 10 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆127Updated 6 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆105Updated 7 months ago
- ☆175Updated last year
- ☆259Updated 11 months ago
- ☆221Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆58Updated 3 months ago
- ☆63Updated 2 years ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆98Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆213Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆101Updated last year
- DSIR large-scale data selection framework for language model training☆230Updated 7 months ago
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 2 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 5 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆26Updated 5 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆80Updated last week
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆86Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆108Updated last year
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆213Updated last year