bethgelab / CiteME
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆38Updated last week
Related projects ⓘ
Alternatives and complementary repositories for CiteME
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 3 weeks ago
- ☆74Updated 2 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆142Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆72Updated last month
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Repository for the paper Stream of Search: Learning to Search in Language☆84Updated 3 months ago
- ☆111Updated last month
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆68Updated 6 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆91Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- ☆246Updated 4 months ago
- Official implementation for <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>, accepted by ACL 2024. It a…☆35Updated 2 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- An automated tool for discovering insights from research papaer corpora☆135Updated 5 months ago
- A simple unified framework for evaluating LLMs☆138Updated this week
- ☆100Updated 3 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 3 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆61Updated 8 months ago
- Evaluating LLMs with CommonGen-Lite☆84Updated 7 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆160Updated last month
- A virtual environment for developing and evaluating automated scientific discovery agents.☆90Updated last month
- ☆18Updated 3 weeks ago
- Just a bunch of benchmark logs for different LLMs☆114Updated 3 months ago
- Discovering Data-driven Hypotheses in the Wild☆39Updated 2 weeks ago
- Can Language Models Solve Olympiad Programming?☆100Updated 3 months ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆45Updated last month
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆36Updated 4 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago