behavioral-data / BLADELinks
Code for Benchmarking Language Model Agents for Data-Driven Science
☆29Updated 10 months ago
Alternatives and similar repositories for BLADE
Users that are interested in BLADE are comparing it to the libraries listed below
Sorting:
- ☆27Updated 7 months ago
- Codebase for Instruction Following without Instruction Tuning☆35Updated 11 months ago
- ☆17Updated last month
- ☆31Updated last year
- [NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? https://aclanthology.org/2024.naa…☆54Updated last month
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 7 months ago
- ☆43Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆27Updated last year
- The source code for running LLMs on the AAAR-1.0 benchmark.☆17Updated 4 months ago
- Automatic prompt optimization framework for multi-step agent tasks.☆33Updated 9 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆94Updated 8 months ago
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆30Updated 5 months ago
- This repo explores how AMR to address tasks difficult for LLMs☆13Updated last year
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆80Updated 2 weeks ago
- Code repo for MathAgent☆17Updated last year
- DataSciBench: An LLM Agent Benchmark for Data Science☆26Updated 6 months ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆37Updated 2 years ago
- ☆13Updated 7 months ago
- ☆20Updated 4 months ago
- Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https:…☆27Updated 2 weeks ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆42Updated last year
- Aioli: A unified optimization framework for language model data mixing☆27Updated 7 months ago
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆22Updated 9 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated 5 months ago
- Code/data for MARG (multi-agent review generation)☆49Updated 9 months ago
- Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner☆27Updated last year
- ☆19Updated 5 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆12Updated 3 weeks ago
- ☆22Updated last year