bethgelab / CiteMELinks
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆48Updated last year
Alternatives and similar repositories for CiteME
Users that are interested in CiteME are comparing it to the libraries listed below
Sorting:
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
 - Functional Benchmarks and the Reasoning Gap☆89Updated last year
 - ☆80Updated 2 weeks ago
 - Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
 - ☆119Updated last year
 - Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
 - ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 8 months ago
 - Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆218Updated last week
 - [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆99Updated 11 months ago
 - 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 2 months ago
 - Evaluating LLMs with fewer examples☆164Updated last year
 - ☆23Updated 8 months ago
 - Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 10 months ago
 - ☆103Updated 9 months ago
 - Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆116Updated 2 weeks ago
 - ☆58Updated 4 months ago
 - ☆122Updated 8 months ago
 - Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
 - Attribute (or cite) statements generated by LLMs back to in-context information.☆294Updated last year
 - Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆44Updated 7 months ago
 - Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆159Updated 5 months ago
 - Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆109Updated last year
 - ☆48Updated last year
 - Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆56Updated last month
 - PyTorch library for Active Fine-Tuning☆93Updated last month
 - Discovering Data-driven Hypotheses in the Wild☆114Updated 4 months ago
 - Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
 - Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆50Updated last year
 - Official repo for Learning to Reason for Long-Form Story Generation☆72Updated 6 months ago
 - [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆106Updated 2 months ago