wantbook-book / SeRLLinks
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
☆14Updated 2 months ago
Alternatives and similar repositories for SeRL
Users that are interested in SeRL are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Official implementation for paper "Can Graph Learning Improve Planning in LLM-based Agents?"☆146Updated 6 months ago
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆34Updated last year
- ☆72Updated 3 weeks ago
- ☆66Updated 11 months ago
- Enhances Overleaf by allowing article searches and BibTeX retrieval from DBLP and Google Scholar | 通过允许从 DBLP 和 Google Scholar 进行文章搜索和获取 …☆114Updated 7 months ago
- Code repo for "LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners"☆54Updated 6 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆90Updated 3 weeks ago
- RFTT: Reasoning with Reinforced Functional Token Tuning☆29Updated 5 months ago
- ☆91Updated 8 months ago
- [NeurIPS 2024] GITA: Graph to Image-Text Integration for Vision-Language Graph Reasoning☆52Updated last year
- A research repo for experiments about Reinforcement Finetuning☆52Updated 7 months ago
- ☆76Updated 2 weeks ago
- A Survey of Direct Preference Optimization (DPO)☆82Updated 4 months ago
- Reinforced Multi-LLM Agents training☆59Updated 5 months ago
- Official repository of paper "Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models"☆23Updated 6 months ago
- ☆38Updated 3 months ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆54Updated 2 months ago
- A Survey of Personalization: From RAG to Agent☆87Updated 3 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆97Updated 11 months ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆46Updated 6 months ago
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents☆196Updated 3 weeks ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆42Updated last year
- ☆50Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆92Updated last year
- Accepted LLM Papers in NeurIPS 2024☆37Updated last year
- On Memorization of Large Language Models in Logical Reasoning☆72Updated 8 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆49Updated last year
- [NAACL 25 main] Awesome LLM Causal Reasoning is a collection of LLM-based casual reasoning works, including papers, codes and datasets.☆104Updated 2 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆60Updated 3 weeks ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆82Updated 5 months ago