AkariAsai / OpenScholar_ExpertEvalLinks
This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.
☆29Updated last year
Alternatives and similar repositories for OpenScholar_ExpertEval
Users that are interested in OpenScholar_ExpertEval are comparing it to the libraries listed below
Sorting:
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 5 months ago
- ☆67Updated 9 months ago
- This repository contains ScholarQABench data and evaluation pipeline.☆93Updated 4 months ago
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆52Updated last year
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆86Updated 9 months ago
- ☆41Updated 7 months ago
- ☆39Updated 7 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆27Updated last year
- ☆93Updated 2 months ago
- When Reasoning Meets Its Laws☆33Updated last week
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 11 months ago
- ☆39Updated last year
- ☆55Updated last year
- ☆54Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆37Updated 5 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆87Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- ☆63Updated 6 months ago
- Verifiers for LLM Reinforcement Learning☆79Updated 8 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆101Updated 4 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆111Updated 8 months ago
- ☆59Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆119Updated 7 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆250Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆110Updated last month
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 11 months ago
- Automated Qualitative Analysis of LLMs (ICLR 2025)☆53Updated 6 months ago
- [ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations☆14Updated 2 months ago