Singla17 / dynamic-alignment-optimization
[EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-based optimization framework that allows LLMs to iteratively self-improve and design the best alignment instructions without the need for additional training.
☆23Updated 4 months ago
Alternatives and similar repositories for dynamic-alignment-optimization:
Users that are interested in dynamic-alignment-optimization are comparing it to the libraries listed below
- Evaluate the Quality of Critique☆34Updated 9 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated 2 years ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆48Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 3 months ago
- ☆29Updated 3 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆55Updated 8 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆52Updated 9 months ago
- ☆41Updated last year
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆53Updated 6 months ago
- ☆34Updated last year
- Trending projects & awesome papers about data-centric llm studies.☆33Updated 2 months ago
- Towards Systematic Measurement for Long Text Quality☆34Updated 6 months ago
- Evaluation on Logical Reasoning and Abstract Reasoning Challenges☆25Updated last year
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆22Updated last year
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"☆23Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆62Updated 4 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 2 months ago
- [COLM'24] "How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?"☆21Updated 5 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆54Updated 5 months ago
- ☆12Updated last year
- ☆47Updated last year
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆29Updated 9 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆67Updated 11 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated 2 years ago
- Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…☆19Updated 3 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆24Updated last year
- ☆94Updated last year