bethgelab / CiteME
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆38Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for CiteME
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆36Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆147Updated last month
- ☆106Updated 2 months ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆68Updated 7 months ago
- ☆74Updated 3 weeks ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆28Updated 2 weeks ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆96Updated last month
- This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.☆99Updated this week
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆92Updated 5 months ago
- ☆40Updated last month
- ☆68Updated 3 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆26Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- Discovering Data-driven Hypotheses in the Wild☆41Updated this week
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 4 months ago
- ☆247Updated 5 months ago
- Evaluation of neuro-symbolic engines☆33Updated 3 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆162Updated last month
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated last year
- Experiments for efforts to train a new and improved t5☆76Updated 7 months ago
- Replicating O1 inference-time scaling laws☆49Updated last month
- ☆101Updated 3 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆95Updated last month
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆46Updated last month