behavioral-data / BLADE
Code for Benchmarking Language Model Agents for Data-Driven Science
☆18Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for BLADE
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Updated last year
- Codebase for Instruction Following without Instruction Tuning☆32Updated last month
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆36Updated 5 months ago
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- AbstainQA, ACL 2024☆19Updated last month
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆26Updated 3 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆30Updated 3 months ago
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆13Updated 10 months ago
- Repo for Llatrieval☆28Updated 3 months ago
- ☆12Updated 2 months ago
- ☆30Updated this week
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆40Updated 4 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆30Updated 6 months ago
- Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking☆16Updated last year
- Evaluate the Quality of Critique☆35Updated 5 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆57Updated 3 weeks ago
- ☆36Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆36Updated 8 months ago
- Tasks for describing differences between text distributions.☆16Updated 3 months ago
- ☆31Updated 7 months ago
- Official Code Repository for [AutoScale–Automatic Prediction of Compute-optimal Data Compositions for Training LLMs]☆8Updated last month
- Towards Systematic Measurement for Long Text Quality☆28Updated 2 months ago
- Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"☆14Updated last year
- MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. Evaluate your retrieval models on 126 diverse tasks. [EMNLP 2024]☆13Updated 2 weeks ago
- ☆17Updated 8 months ago
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"☆22Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆30Updated 2 years ago
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆15Updated 2 weeks ago
- Code implementation of synthetic continued pretraining☆60Updated last month