Neph0s / CoSER
Official Code for "Coser: Coordinating LLM-Based Persona Simulation of Established Roles"
☆59Updated last month
Alternatives and similar repositories for CoSER:
Users that are interested in CoSER are comparing it to the libraries listed below
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆61Updated 7 months ago
- ☆47Updated 4 months ago
- ☆56Updated 6 months ago
- Code and Data for the paper "Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works".☆17Updated 9 months ago
- Official code for the paper: InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews (previo…☆72Updated 6 months ago
- ☆49Updated last year
- ☆121Updated this week
- ☆115Updated 2 weeks ago
- Reformatted Alignment☆115Updated 7 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 2 weeks ago
- ☆94Updated 5 months ago
- ☆36Updated 8 months ago
- ☆81Updated last year
- On Memorization of Large Language Models in Logical Reasoning☆65Updated last month
- Code and Dataset for the paper "LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming" ACL …☆35Updated last year
- ☆37Updated 3 weeks ago
- (ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆68Updated 3 months ago
- Awesome papers for role-playing with language models☆186Updated 6 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 5 months ago
- ☆91Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated 10 months ago
- ☆41Updated 6 months ago
- ☆149Updated last week
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆75Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆89Updated 2 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆26Updated 4 months ago
- A Bilingual Role Evaluation Benchmark for Large Language Models☆40Updated last year
- The official repository of the Omni-MATH benchmark.☆83Updated 4 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆41Updated 2 weeks ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year