xxxiaol / QRData
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
☆28Updated last month
Related projects: ⓘ
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆41Updated last month
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆28Updated 3 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated 8 months ago
- Evaluate the Quality of Critique☆35Updated 3 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆39Updated 4 months ago
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆49Updated 2 weeks ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆37Updated last year
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆37Updated 2 months ago
- AbstainQA, ACL 2024☆17Updated 3 weeks ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆45Updated 6 months ago
- LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆26Updated 5 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆46Updated 5 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆38Updated 2 months ago
- Supporting code for ReCEval paper☆26Updated this week
- ☆39Updated 9 months ago
- Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…☆28Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆58Updated 2 months ago
- ☆32Updated 5 months ago
- Tasks for describing differences between text distributions.☆15Updated last month
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆40Updated 8 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆22Updated 3 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆54Updated 6 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models☆42Updated last week
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- ☆52Updated 7 months ago
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆13Updated last year
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆62Updated 3 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆24Updated last month
- Lightweight tool to identify Data Contamination in LLMs evaluation☆39Updated 6 months ago