bethgelab / CiteMELinks
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆48Updated 10 months ago
Alternatives and similar repositories for CiteME
Users that are interested in CiteME are comparing it to the libraries listed below
Sorting:
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated 10 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 11 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆98Updated last week
- ☆56Updated 2 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆107Updated 8 months ago
- ☆118Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 6 months ago
- ☆78Updated 2 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆274Updated 10 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆96Updated 9 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆214Updated last month
- This repository contains ScholarQABench data and evaluation pipeline.☆79Updated 2 weeks ago
- The first dense retrieval model that can be prompted like an LM☆85Updated 3 months ago
- ☆98Updated 4 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 8 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆85Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆61Updated 8 months ago
- Discovering Data-driven Hypotheses in the Wild☆104Updated 2 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆47Updated 3 weeks ago
- ☆90Updated 7 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆168Updated 2 weeks ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆110Updated 11 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 6 months ago
- Evaluating LLMs with fewer examples☆160Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆109Updated 2 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year