behavioral-data / BLADE
Code for Benchmarking Language Model Agents for Data-Driven Science
☆24Updated 4 months ago
Alternatives and similar repositories for BLADE:
Users that are interested in BLADE are comparing it to the libraries listed below
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆41Updated last year
- Code/data for MARG (multi-agent review generation)☆41Updated 4 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆70Updated 3 months ago
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Updated last year
- The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agen…☆24Updated last year
- AbstainQA, ACL 2024☆25Updated 5 months ago
- Efficient retrieval head analysis with triton flash attention that supports topK probability☆12Updated 9 months ago
- The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph☆14Updated 5 months ago
- Evaluate the Quality of Critique☆35Updated 9 months ago
- ACL24☆9Updated 9 months ago
- ☆22Updated 3 months ago
- Complexity Based Prompting for Multi-Step Reasoning☆17Updated 2 years ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 6 months ago
- ☆23Updated 2 months ago
- ☆39Updated 7 months ago
- [Findings of EMNLP2023] Code of Paper "Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation".☆30Updated last year
- Code repo for MathAgent☆15Updated last year
- ☆12Updated last year
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆35Updated last month
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆18Updated 4 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆48Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated 2 years ago
- LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆29Updated 11 months ago
- Towards Systematic Measurement for Long Text Quality☆33Updated 6 months ago
- ☆15Updated 9 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆21Updated 3 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆37Updated this week
- ☆28Updated 6 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆35Updated last year