AkariAsai / OpenScholar_ExpertEvalLinks
This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.
☆24Updated 6 months ago
Alternatives and similar repositories for OpenScholar_ExpertEval
Users that are interested in OpenScholar_ExpertEval are comparing it to the libraries listed below
Sorting:
- This repository contains ScholarQABench data and evaluation pipeline.☆72Updated last month
- ☆65Updated 2 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- ☆34Updated last week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Aioli: A unified optimization framework for language model data mixing☆25Updated 4 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆36Updated 3 months ago
- ☆50Updated this week
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- ☆41Updated 5 months ago
- ☆19Updated 2 months ago
- ☆20Updated last month
- ☆21Updated 3 months ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 months ago
- The first dense retrieval model that can be prompted like an LM☆73Updated 3 weeks ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 7 months ago
- ☆61Updated 10 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- PyTorch implementation for MRL☆18Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- The official repo for the code and data of paper SMART☆26Updated 3 months ago
- Code repo for MathAgent☆16Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 7 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆15Updated 3 weeks ago
- ☆17Updated last month
- ☆24Updated 8 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- ☆13Updated 5 months ago