behavioral-data / BLADE
Code for Benchmarking Language Model Agents for Data-Driven Science
☆26Updated 6 months ago
Alternatives and similar repositories for BLADE:
Users that are interested in BLADE are comparing it to the libraries listed below
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Updated last year
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆41Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- Aioli: A unified optimization framework for language model data mixing☆25Updated 3 months ago
- A Comprehensive Library for Memory of LLM-based Agents.☆15Updated 2 months ago
- Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"☆14Updated last year
- Code repo for EMNLP 2023 paper "Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models"☆22Updated last year
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆20Updated 11 months ago
- ☆25Updated 2 years ago
- Code/data for MARG (multi-agent review generation)☆43Updated 5 months ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆13Updated 6 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆29Updated 8 months ago
- Code for the 2024 arXiv publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Mo…☆24Updated 10 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- SRTK: Retrieve semantic-relevant subgraphs from large-scale knowledge graphs☆27Updated 7 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated last month
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆84Updated 5 months ago
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆17Updated last month
- Code and Data for "Language Modeling with Editable External Knowledge"☆32Updated 10 months ago
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆53Updated 8 months ago
- ☆52Updated 10 months ago
- Tasks for describing differences between text distributions.☆16Updated 9 months ago
- ✨ Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆16Updated 7 months ago
- ☆28Updated 6 months ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆34Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 6 months ago
- Evaluate the Quality of Critique☆34Updated 11 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 4 months ago
- ☆26Updated last year