AkariAsai / ScholarQABenchLinks
This repository contains ScholarQABench data and evaluation pipeline.
β85Updated last month
Alternatives and similar repositories for ScholarQABench
Users that are interested in ScholarQABench are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Searchβ98Updated 10 months ago
- π’ Data Toolkit for Sailor Language Modelsβ94Updated 7 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Noveltyβ85Updated last year
- Codebase accompanying the Summary of a Haystack paper.β79Updated last year
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discoveryβ103Updated last month
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".β108Updated 11 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".β202Updated 3 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.β42Updated 6 months ago
- β74Updated last year
- β48Updated last year
- β155Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ133Updated last year
- Code and Data for "Language Modeling with Editable External Knowledge"β36Updated last year
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paperβ¦β124Updated last year
- β127Updated last year
- Pretraining Efficiently on S2ORC!β169Updated 11 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?β161Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"β55Updated last year
- Discovering Data-driven Hypotheses in the Wildβ112Updated 3 months ago
- β37Updated 8 months ago
- [NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".β131Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".β216Updated 2 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]β148Updated 11 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)β143Updated 10 months ago
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agentβ84Updated this week
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"β201Updated 10 months ago
- Code/data for MARG (multi-agent review generation)β51Updated this week
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)β48Updated 8 months ago
- A Survey of Attributions for Large Language Modelsβ216Updated last year
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)β83Updated last year