vsamuel2003 / PersonaGymLinks
☆18Updated last month
Alternatives and similar repositories for PersonaGym
Users that are interested in PersonaGym are comparing it to the libraries listed below
Sorting:
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆40Updated 2 years ago
- AbstainQA, ACL 2024☆26Updated 8 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆73Updated last month
- ☆86Updated 7 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆107Updated last year
- Evaluate the Quality of Critique☆35Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆69Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆50Updated 3 weeks ago
- ☆25Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Updated 5 months ago
- ☆36Updated 5 months ago
- ☆29Updated 11 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆44Updated 11 months ago
- ☆19Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆41Updated last year
- Code implementation of synthetic continued pretraining☆114Updated 5 months ago
- ☆45Updated 10 months ago
- Code repo for EMNLP 2023 paper "Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models"☆22Updated last year
- ☆72Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆53Updated 9 months ago
- ☆44Updated 10 months ago
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆45Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆89Updated 6 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 9 months ago
- Learning adapter weights from task descriptions☆19Updated last year
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆65Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆60Updated 2 years ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year