AkariAsai / OpenScholar_ExpertEvalLinks
This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.
☆28Updated last year
Alternatives and similar repositories for OpenScholar_ExpertEval
Users that are interested in OpenScholar_ExpertEval are comparing it to the libraries listed below
Sorting:
- ☆40Updated 5 months ago
- This repository contains ScholarQABench data and evaluation pipeline.☆85Updated 3 months ago
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 3 months ago
- ☆35Updated 6 months ago
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆84Updated 7 months ago
- ☆51Updated 6 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆101Updated this week
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆72Updated last week
- ☆67Updated 7 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆105Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- ☆51Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆34Updated 3 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated 11 months ago
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆51Updated last year
- ☆60Updated 4 months ago
- ☆40Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 9 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆65Updated last year
- Automated Qualitative Analysis of LLMs (ICLR 2025)☆51Updated 4 months ago
- Verifiers for LLM Reinforcement Learning☆79Updated 7 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆112Updated 5 months ago
- ☆84Updated 2 weeks ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆27Updated 11 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆101Updated 11 months ago
- ☆86Updated 2 weeks ago
- MIRIAD is a million scale Medical Instruction and RetrIeval Datatset☆128Updated 2 months ago
- ☆25Updated this week
- ☆82Updated this week