behavioral-data / BLADELinks
[EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science
☆33Updated last year
Alternatives and similar repositories for BLADE
Users that are interested in BLADE are comparing it to the libraries listed below
Sorting:
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 10 months ago
- Automatic prompt optimization framework for multi-step agent tasks.☆35Updated last year
- ☆27Updated last week
- ☆46Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- Evaluate the Quality of Critique☆36Updated last year
- This repo explores how AMR to address tasks difficult for LLMs☆13Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆55Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆29Updated last year
- Aioli: A unified optimization framework for language model data mixing☆28Updated 10 months ago
- ACL 2023 (Findings) - BertNet: Harvesting Knowledge Graphs from Pretrained Language Models☆107Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆36Updated 7 months ago
- [NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? https://aclanthology.org/2024.naa…☆55Updated 3 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- [NAACL 2024] Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models☆86Updated last year
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆42Updated last year
- The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph☆18Updated last year
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆27Updated last year
- Evaluation on Logical Reasoning and Abstract Reasoning Challenges☆29Updated 7 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆101Updated 11 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆26Updated 11 months ago
- ☆15Updated last year
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆27Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆60Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 3 years ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆81Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Source code for GreaTer ICLR 2025 - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers☆33Updated 7 months ago