xqlin98 / APOHFLinks
Prompt Optimization with Human Feedback
☆16Updated 10 months ago
Alternatives and similar repositories for APOHF
Users that are interested in APOHF are comparing it to the libraries listed below
Sorting:
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆23Updated 3 months ago
- A Survey of Personalization: From RAG to Agent☆43Updated last month
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆14Updated last year
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆17Updated 11 months ago
- ☆42Updated 7 months ago
- [ICML 2024] One Prompt is Not Enough: Automated Construction of a Mixture-of-Expert Prompts - TurningPoint AI☆26Updated 8 months ago
- ☆40Updated 3 weeks ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆58Updated 2 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 3 months ago
- ☆24Updated last month
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆45Updated 7 months ago
- ☆42Updated last month
- A Sober Look at Language Model Reasoning☆63Updated last week
- ☆28Updated 7 months ago
- Can Knowledge Editing Really Correct Hallucinations? (ICLR 2025)☆16Updated 2 weeks ago
- ☆20Updated 7 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆66Updated 6 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 6 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆69Updated last year
- [NAACL 25 main] Awesome LLM Causal Reasoning is a collection of LLM-based casual reasoning works, including papers, codes and datasets.☆63Updated 3 months ago
- ☆19Updated 9 months ago
- ☆31Updated last year
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆35Updated 2 weeks ago
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆22Updated last year
- ☆11Updated 9 months ago
- [NeurIPS 2024] HonestLLM: Toward an Honest and Helpful Large Language Model☆26Updated 6 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- ☆114Updated 4 months ago
- ☆27Updated 8 months ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆41Updated last year