behavioral-data / BLADELinks
[EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science
☆34Updated last year
Alternatives and similar repositories for BLADE
Users that are interested in BLADE are comparing it to the libraries listed below
Sorting:
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- This repo explores how AMR to address tasks difficult for LLMs☆13Updated 2 years ago
- ☆28Updated 2 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆40Updated 10 months ago
- ☆31Updated last year
- ACL 2023 (Findings) - BertNet: Harvesting Knowledge Graphs from Pretrained Language Models☆107Updated last year
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆27Updated last year
- Evaluation on Logical Reasoning and Abstract Reasoning Challenges☆29Updated 9 months ago
- Aioli: A unified optimization framework for language model data mixing☆32Updated last year
- [NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? https://aclanthology.org/2024.naa…☆55Updated 6 months ago
- Evaluate the Quality of Critique☆36Updated last year
- ☆49Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆45Updated 11 months ago
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆102Updated 5 months ago
- ☆18Updated 6 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆30Updated last year
- ☆30Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆63Updated last year
- ☆20Updated 9 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Updated last week
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆55Updated last year
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆15Updated 2 years ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆41Updated last year
- Optimize Any User-defined Compound AI Systems☆66Updated 5 months ago
- Code repo for EMNLP 2023 paper "Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models"☆23Updated 2 years ago
- ☆35Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆33Updated last year