behavioral-data / BLADELinks
Code for Benchmarking Language Model Agents for Data-Driven Science
☆33Updated last year
Alternatives and similar repositories for BLADE
Users that are interested in BLADE are comparing it to the libraries listed below
Sorting:
- ☆27Updated 9 months ago
- Evaluation on Logical Reasoning and Abstract Reasoning Challenges☆29Updated 6 months ago
- ACL 2023 (Findings) - BertNet: Harvesting Knowledge Graphs from Pretrained Language Models☆107Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆29Updated last year
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆78Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆36Updated 7 months ago
- [NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? https://aclanthology.org/2024.naa…☆55Updated 3 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆99Updated 11 months ago
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆23Updated 11 months ago
- ☆46Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Updated last year
- ☆15Updated last year
- Optimize Any User-defined Compound AI Systems☆59Updated 2 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆40Updated 2 years ago
- [EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners☆25Updated 10 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆49Updated 9 months ago
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆33Updated 2 months ago
- ☆31Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Updated 2 years ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- ☆17Updated 3 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆25Updated 10 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆43Updated 8 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆37Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixing☆28Updated 9 months ago
- Few-shot Learning with Auxiliary Data☆31Updated last year