junkangwu / QAELinks
Quantile Advantage Estimation for Entropy-Safe Reasoning
☆19Updated last month
Alternatives and similar repositories for QAE
Users that are interested in QAE are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] The implementation of paper "On Softmax Direct Preference Optimization for Recommendation"☆88Updated last year
- [TMLR 2025] A general framework for bridging LLMs and recommendation systems via reinforcement learning. https://arxiv.org/pdf/2503.24289☆119Updated 3 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆49Updated last year
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆34Updated last year
- ☆76Updated 2 weeks ago
- ☆23Updated 2 years ago
- ☆30Updated 2 months ago
- ☆26Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆93Updated last year
- This is the repo for the survey of Bias and Fairness in IR with LLMs.☆59Updated 2 months ago
- Official Implementation of "Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning" at EMNLP 2024 Main Conf…☆41Updated 4 months ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆70Updated 4 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆79Updated 3 weeks ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆63Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆151Updated 5 months ago
- A research repo for experiments about Reinforcement Finetuning☆52Updated 7 months ago
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples☆44Updated 4 months ago
- The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"☆87Updated 2 months ago
- This is the official implementation of the paper "Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Lea…☆22Updated 11 months ago
- content-neutral dataset of logical reasoning☆18Updated 8 months ago
- ☆51Updated last year
- Language Models as Semantic Indexers (ICML 2024)☆37Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆132Updated 8 months ago
- ☆117Updated 2 weeks ago
- Official Code of our AAAI-24 Paper: "Generative Multi-modal Knowledge Retrieval with Large Language Models".☆29Updated 2 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆128Updated last month
- ☆18Updated last month
- ☆16Updated last year
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …☆26Updated this week
- ☆126Updated last week