bethgelab / CiteMELinks
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆48Updated 9 months ago
Alternatives and similar repositories for CiteME
Users that are interested in CiteME are comparing it to the libraries listed below
Sorting:
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated 9 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 10 months ago
- ☆88Updated 7 months ago
- ☆76Updated this week
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆93Updated 8 months ago
- ☆54Updated last month
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 7 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 10 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆99Updated last week
- ☆118Updated 11 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆59Updated 8 months ago
- Evaluating LLMs with fewer examples☆160Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆268Updated 10 months ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆135Updated last week
- Discovering Data-driven Hypotheses in the Wild☆104Updated 2 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆97Updated 2 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆84Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆108Updated 2 months ago
- ☆28Updated 4 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆109Updated 10 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆100Updated 3 months ago
- Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆45Updated this week
- ☆95Updated 3 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆92Updated 2 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆166Updated this week
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 11 months ago
- Code/data for MARG (multi-agent review generation)☆47Updated 8 months ago
- ☆34Updated 2 months ago