WujiangXu/EPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/WujiangXu/EPO)

WujiangXu / EPO

The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"

☆40

Alternatives and similar repositories for EPO

Users that are interested in EPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

WujiangXu / MemGym
View on GitHub
The code for paper "MemGym: a Long-Horizon Memory Environment for LLM Agents".
☆19Jun 2, 2026Updated last month
ZhentingWang / DUMP
View on GitHub
☆33May 9, 2025Updated last year
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated last month
WujiangXu / SLMRec
View on GitHub
The code for ICLR2025 paper "SLMRec: Empowering Small Language Models for Sequential Recommendation".
☆51Jun 16, 2025Updated last year
MinghoKwok / MemEye
View on GitHub
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
☆48May 17, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
pUmpKin-Co / ComplementaryRL
View on GitHub
Co-evolving policy actors and experience extractors for efficient experience-driven agent RL
☆51May 12, 2026Updated 2 months ago
MingyuJ666 / LVLM-Safety
View on GitHub
[FCS'24] LVLM Safety paper
☆19Jan 4, 2025Updated last year
MinghoKwok / DeepSieve
View on GitHub
[EACL'26] DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router
☆108Jan 4, 2026Updated 6 months ago
shiweijiezero / R3L
View on GitHub
☆23Apr 5, 2026Updated 3 months ago
kaiwenzha / RL-Tango
View on GitHub
[NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
☆57Oct 23, 2025Updated 9 months ago
ablghtianyi / ICL_Modular_Arithmetic
View on GitHub
☆19Mar 25, 2025Updated last year
MasterVito / SwS
View on GitHub
Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning
☆42Nov 11, 2025Updated 8 months ago
MingyuJ666 / sparsityLLM
View on GitHub
[preprint] sparsity
☆22Updated this week
BaohaoLiao / frac-cot
View on GitHub
[COLM 2026] An efficient 3D sampling method for long-CoT LLM.
☆16May 25, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
liumy2010 / UFT
View on GitHub
UFT: Unifying Supervised and Reinforcement Fine-Tuning
☆31Jun 30, 2025Updated last year
YihongT / LLMSynthor
View on GitHub
☆21Jul 3, 2025Updated last year
agiresearch / AgentRecSys
View on GitHub
☆118Jan 23, 2026Updated 6 months ago
test-time-interaction / TTI
View on GitHub
☆76Jun 10, 2025Updated last year
krafton-ai / lexico
View on GitHub
KV cache compression via sparse coding
☆17Oct 26, 2025Updated 9 months ago
MingyuJ666 / Rope_with_LLM
View on GitHub
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆87Jun 20, 2025Updated last year
TsinghuaC3I / SSRL
View on GitHub
SSRL: Self-Search Reinforcement Learning
☆210Aug 20, 2025Updated 11 months ago
insuhan / calibquant
View on GitHub
☆21Apr 3, 2025Updated last year
MingyuJ666 / ProLLM
View on GitHub
[COLM'24] We propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as language promp…
☆73Nov 23, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
bin123apple / InfantAgent
View on GitHub
[NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.
☆39Apr 23, 2026Updated 3 months ago
starrYYxuan / UniTE
View on GitHub
☆17Nov 20, 2024Updated last year
WooooDyy / AgentGym-RL
View on GitHub
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemen…
☆822Feb 15, 2026Updated 5 months ago
qualidea1217 / HiPRAG
View on GitHub
HiPRAG (Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation) is a reinforcement learning method designed fo…
☆26Oct 10, 2025Updated 9 months ago
zhang677 / PCL-lite
View on GitHub
[ICML 2025] Adaptive Self-improvement LLM Agentic System for ML Library Development
☆17Jan 6, 2026Updated 6 months ago
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆446Jul 11, 2025Updated last year
haotiansun14 / BBox-Adapter
View on GitHub
Lightweight Adapting for Black-Box Large Language Models
☆26Feb 15, 2024Updated 2 years ago
McGill-NLP / VinePPO
View on GitHub
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆192May 25, 2025Updated last year
Gen-Verse / CURE
View on GitHub
[NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
☆167Sep 19, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
wumingqi / LLM-Math-Evaluation
View on GitHub
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.
☆21Jul 18, 2025Updated last year
hrlics / LITE
View on GitHub
[COLM 2024] LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models
☆14Jan 4, 2025Updated last year
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
Luckfort / CD
View on GitHub
[COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
☆82Jan 22, 2025Updated last year
thu-ml / Noise-Contrastive-Alignment
View on GitHub
Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)
☆59Nov 8, 2024Updated last year
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
sunblaze-ucb / omega
View on GitHub
☆47Jun 24, 2025Updated last year