CUHK-ARISE / GAMABench
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
☆33Updated this week
Related projects: ⓘ
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆24Updated 6 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- ☆80Updated 9 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆57Updated 7 months ago
- ☆25Updated 7 months ago
- Evaluate the Quality of Critique☆35Updated 3 months ago
- Benchmarking LLMs' Emotional Alignment with Humans☆60Updated last month
- The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agen…☆20Updated 6 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆45Updated 6 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models☆33Updated 9 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆62Updated 3 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- Multilingual safety benchmark for Large Language Models☆21Updated 2 weeks ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆28Updated 8 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆41Updated 10 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆39Updated last year
- This is the official repository for the paper "EmoBench: Evaluating the Emotional Intelligence of Large Language Models"☆39Updated 6 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 3 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆48Updated 4 months ago
- Knowledge Circuits in Pretrained Transformers☆46Updated this week
- ☆21Updated this week
- ☆44Updated 8 months ago
- Evaluating Mathematical Reasoning Beyond Accuracy☆32Updated 5 months ago
- ☆42Updated 5 months ago
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆121Updated last year
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆48Updated 6 months ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆36Updated 5 months ago
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆30Updated 2 months ago
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆20Updated 9 months ago
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆15Updated 6 months ago