bethgelab / CiteMELinks
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆48Updated 3 weeks ago
Alternatives and similar repositories for CiteME
Users that are interested in CiteME are comparing it to the libraries listed below
Sorting:
- Functional Benchmarks and the Reasoning Gap☆90Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- ☆87Updated this week
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 9 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆110Updated 11 months ago
- ☆120Updated last year
- Source code for the collaborative reasoner research project at Meta FAIR.☆106Updated 7 months ago
- ☆62Updated 5 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆106Updated this week
- Evaluating LLMs with fewer examples☆168Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆101Updated 11 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 3 months ago
- ☆90Updated 3 weeks ago
- ☆35Updated 6 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- Official repo for Learning to Reason for Long-Form Story Generation☆72Updated 7 months ago
- ☆48Updated last year
- ☆124Updated 9 months ago
- ☆86Updated last year
- Code for ExploreTom☆87Updated 5 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆300Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Updated last year
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 3 months ago
- ☆143Updated 2 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆234Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆109Updated last year
- The first dense retrieval model that can be prompted like an LM☆89Updated 6 months ago