bethgelab / CiteMELinks
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
☆48Updated 2 months ago
Alternatives and similar repositories for CiteME
Users that are interested in CiteME are comparing it to the libraries listed below
Sorting:
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- Functional Benchmarks and the Reasoning Gap☆90Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 11 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆313Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆111Updated last year
- ☆121Updated last year
- ☆105Updated last year
- LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).☆48Updated 2 years ago
- ☆92Updated 3 weeks ago
- ☆63Updated 6 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆65Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆222Updated 3 weeks ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated 6 months ago
- ☆124Updated 10 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- ☆129Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆102Updated 5 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆112Updated last year
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- Official repo for Learning to Reason for Long-Form Story Generation☆73Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Discovering Data-driven Hypotheses in the Wild☆124Updated 7 months ago
- Replicating O1 inference-time scaling laws☆91Updated last year
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- Evaluating LLMs with fewer examples☆170Updated last year
- PyTorch library for Active Fine-Tuning☆96Updated 3 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- The first dense retrieval model that can be prompted like an LM☆90Updated 8 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆83Updated last year