AkariAsai / OpenScholar_ExpertEval
This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.
☆24Updated 5 months ago
Alternatives and similar repositories for OpenScholar_ExpertEval:
Users that are interested in OpenScholar_ExpertEval are comparing it to the libraries listed below
- This repository contains ScholarQABench data and evaluation pipeline.☆71Updated 2 weeks ago
- Implementation of "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models"☆27Updated 2 months ago
- ☆62Updated 3 weeks ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆35Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- Code and Data for "Language Modeling with Editable External Knowledge"☆32Updated 10 months ago
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 2 months ago
- ☆24Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 6 months ago
- ☆45Updated 3 weeks ago
- ☆51Updated last week
- ☆19Updated 2 weeks ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆53Updated 6 months ago
- Agentic Knowledgeable Self-awareness☆50Updated last week
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆52Updated 3 weeks ago
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆62Updated 10 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆81Updated 2 weeks ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆39Updated last month
- ☆20Updated last month
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆83Updated 4 months ago
- Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation☆29Updated 2 months ago
- ☆58Updated 9 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆48Updated last month
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 3 months ago
- ☆24Updated 7 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- ☆62Updated 9 months ago