behavioral-data / BLADE
Code for Benchmarking Language Model Agents for Data-Driven Science
☆24Updated 4 months ago
Alternatives and similar repositories for BLADE:
Users that are interested in BLADE are comparing it to the libraries listed below
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆70Updated 3 months ago
- ☆39Updated 7 months ago
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Updated last year
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆18Updated 4 months ago
- Evaluate the Quality of Critique☆35Updated 9 months ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆41Updated 11 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆19Updated 9 months ago
- The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph☆14Updated 5 months ago
- AbstainQA, ACL 2024☆25Updated 5 months ago
- ☆41Updated last year
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆27Updated 6 months ago
- Complexity Based Prompting for Multi-Step Reasoning☆17Updated 2 years ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆15Updated last week
- Repo for Llatrieval☆29Updated 6 months ago
- RuleRAG: Rule-guided Retrieval-Augmented Generation with Language Models for Question Answering☆19Updated 3 months ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆11Updated 4 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆35Updated 3 weeks ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆44Updated last month
- Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"☆13Updated last year
- ☆20Updated 8 months ago
- Code repo for EMNLP 2023 paper "Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models"☆21Updated last year
- ☆27Updated 4 months ago
- [EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners☆18Updated 3 months ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆43Updated 4 months ago
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆13Updated last year
- [NeurIPS 2021] Open Rule Induction☆19Updated 2 years ago
- [Findings of EMNLP'2024] Unified Active Retrieval for Retrieval Augmented Generation☆21Updated 5 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year