THUDM / BattleAgentBenchLinks

☆3

Alternatives and similar repositories for BattleAgentBench

Users that are interested in BattleAgentBench are comparing it to the libraries listed below

Sorting:

alexrame / rewardedsoups
Rewarded soups official implementation
☆58Updated last year
alecwangcq / f-divergence-dpo
Direct preference optimization with f-divergences.
☆13Updated 7 months ago
srzer / MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
☆23Updated 7 months ago
tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
☆43Updated 4 months ago
holarissun / Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
☆41Updated last year
YangRui2015 / Generalizable-Reward-Model
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
☆34Updated 3 months ago
ZHZisZZ / emulated-disalignment
[ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
☆36Updated 10 months ago
hammer-wang / Awesome-Transformers-for-Sequential-Decision-Making
Tracking literature and additional online resources on transformers for sequential decision making including RL and beyond.
☆47Updated 2 years ago
eric-ai-lab / llm_coordination
Code repository for the NAACL 2025 paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language…
☆36Updated 7 months ago
deeplearning-wisc / args
☆40Updated last year
WANGXinyiLinda / LM_random_walk
Official code for paper Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
☆20Updated last year
YangRui2015 / RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
☆69Updated 5 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆120Updated 8 months ago
abdulhaim / LMRL-Gym
☆93Updated 11 months ago
EoinKenny / Prototype-Wrapper-Network-ICLR23
☆11Updated 5 months ago
cassidylaidlaw / hidden-context
Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
☆29Updated last year
tianjunz / TEMPERA
☆44Updated 2 years ago
Walter0807 / RepBelief
[ICML 2024] Language Models Represent Beliefs of Self and Others
☆32Updated 8 months ago
nicoladainese96 / code-world-models
Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.
☆11Updated 3 months ago
Shentao-YANG / Preference_Grounded_Guidance
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆16Updated 5 months ago
nsidn98 / LLaMAR
Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics
☆13Updated 3 months ago
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆25Updated 6 months ago
Jackory / RPBT
(AAAI24 oral) Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)
☆12Updated 2 years ago
mukobi / welfare-diplomacy
General-Sum variant of the game Diplomacy for evaluating AIs.
☆29Updated last year
BunsenFeng / model_swarm
☆16Updated 6 months ago
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
yihangyao / OASIS
☆15Updated 7 months ago
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆79Updated 9 months ago
todexter3 / Richelieu
☆14Updated 7 months ago
princeton-pli / what-makes-good-rm
What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆31Updated last month