3B-Group / ConvRe
π€ConvReπ€―: An Investigation of LLMsβ Inefficacy in Understanding Converse Relations (EMNLP 2023)
β23Updated last year
Alternatives and similar repositories for ConvRe:
Users that are interested in ConvRe are comparing it to the libraries listed below
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".β64Updated 2 years ago
- β15Updated last year
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasksβ26Updated 6 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β59Updated 9 months ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planningβ36Updated last year
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinismβ28Updated 9 months ago
- Evaluate the Quality of Critiqueβ34Updated 10 months ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"β109Updated last year
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorialsβ28Updated last month
- β41Updated last year
- A Portal Site for Structured Knowledge Grounding(SKG) Resources.β9Updated 2 years ago
- β25Updated 2 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoningβ24Updated last year
- β23Updated 10 months ago
- β48Updated last year
- β29Updated 3 months ago
- [EMNLP 2022] TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Dataβ17Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.β63Updated 4 months ago
- β59Updated 7 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant seβ¦β60Updated 2 years ago
- Supporting code for ReCEval paperβ28Updated 7 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspectiveβ30Updated last year
- Visual and Embodied Concepts evaluation benchmarkβ21Updated last year
- Evaluation on Logical Reasoning and Abstract Reasoning Challengesβ26Updated last year
- β73Updated 10 months ago
- β12Updated 5 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modelingβ47Updated 3 months ago
- [ICML 2024] Self-Infilling Code Generationβ19Updated 11 months ago
- Domain-specific preference (DSP) data and customized RM fine-tuning.β25Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".β39Updated 2 years ago