behavioral-data / BLADELinks

Code for Benchmarking Language Model Agents for Data-Driven Science

☆28

Alternatives and similar repositories for BLADE

Users that are interested in BLADE are comparing it to the libraries listed below

Sorting:

HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆27Updated 5 months ago
xxxiaol / QRData
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
☆41Updated 4 months ago
raspberryice / inc-schema
Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"
☆15Updated 2 years ago
gersteinlab / Struc-Bench
☆54Updated last year
csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆53Updated 10 months ago
OSU-NLP-Group / In-Context-Reranking
[ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"
☆25Updated 3 months ago
RenzeLou / AAAR-1.0
The source code for running LLMs on the AAAR-1.0 benchmark.
☆16Updated 3 months ago
chenzhongwu20 / RuleRAG_ICL_FT
RuleRAG: Rule-guided Retrieval-Augmented Generation with Language Models for Question Answering
☆22Updated 8 months ago
general-preference / general-preference-model
Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https:…
☆25Updated 2 months ago
bowen-upenn / llm_token_bias
[EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
☆24Updated 7 months ago
AngelaZZZ-611 / reasoning_models_probing
☆12Updated 2 months ago
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
orionw / FollowIR
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
☆45Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 9 months ago
zjunlp / TRICE
[NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback
☆41Updated last year
psunlpgroup / ReaLMistake
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆30Updated 10 months ago
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆50Updated last month
nlp-uoregon / ullme
☆20Updated 3 months ago
princeton-nlp / InstructEval
[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
☆22Updated last year
SalesforceAIResearch / FoFo
☆24Updated 6 months ago
MurongYue / LLM_MoT_cascade
This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…
☆24Updated last year
DAMO-NLP-SG / CaRing
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
☆37Updated last year
dinobby / MAGDi
The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…
☆35Updated last year
FreedomIntelligence / PlatoLM
A trainable user simulator
☆34Updated 2 weeks ago
Reason-Wang / NAT
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆26Updated last year
DualityRL / multi-attempt
☆19Updated 4 months ago
oashua / MathAgent
Code repo for MathAgent
☆17Updated last year
icip-cas / SelfRetrieval
☆33Updated 8 months ago
allenai / marg-reviewer
Code/data for MARG (multi-agent review generation)
☆44Updated 8 months ago
zjunlp / knowledge-rumination
[EMNLP 2023] Knowledge Rumination for Pre-trained Language Models
☆17Updated 2 years ago