HITsz-TMG / Multi-agent-peer-review
Official implementation of our paper "Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration".
β13Updated 5 months ago
Alternatives and similar repositories for Multi-agent-peer-review
Users that are interested in Multi-agent-peer-review are comparing it to the libraries listed below
Sorting:
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"β69Updated last year
- π² Code for our EMNLP 2023 paper - π "Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Modeβ¦β50Updated last year
- [Neurips2023] Source code for Lift Yourself Up: Retrieval-augmented Text Generation with Self Memoryβ59Updated last year
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"β59Updated last year
- Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"β69Updated 3 years ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)β57Updated 6 months ago
- Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets andβ¦β42Updated 6 months ago
- AbstainQA, ACL 2024β25Updated 7 months ago
- β41Updated last year
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"β24Updated last year
- β71Updated last year
- Towards Systematic Measurement for Long Text Qualityβ34Updated 8 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrievalβ114Updated last month
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don'tβ¦β111Updated 10 months ago
- Enhancing contextual understanding in large language models through contrastive decodingβ17Updated last year
- code for Preprint paper at Arxiv: MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughtsβ20Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".β39Updated 2 years ago
- Merging Generated and Retrieved Knowledge for Open-Domain QA (EMNLP 2023)β22Updated last year
- Code for COLING 2022 long paper: Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-β¦β22Updated 2 years ago
- β17Updated 11 months ago
- β86Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"β25Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)β59Updated last year
- [ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"β53Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"β54Updated last year
- β42Updated 5 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seβ¦β60Updated 2 years ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000β¦β47Updated last year
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"β80Updated last year
- A comprehensive paper list of Table-based Question Answering.β27Updated last year