bethgelab / CiteMELinks

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

☆48

Alternatives and similar repositories for CiteME

Users that are interested in CiteME are comparing it to the libraries listed below

Sorting:

ZonglinY / MOOSE
[ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …
☆42Updated 8 months ago
yale-nlp / SciArena
Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"
☆42Updated 2 weeks ago
allenai / infinigram-api
☆69Updated last month
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆106Updated 7 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 9 months ago
para-lost / ReBase
ReBase: Training Task Experts through Retrieval Based Distillation
☆29Updated 5 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 9 months ago
gangiswag / infogent
☆20Updated 4 months ago
EagleW / Scientific-Inspiration-Machines-Optimized-for-Novelty
Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty
☆81Updated last year
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆48Updated 3 months ago
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆58Updated 7 months ago
joshuacnf / Ctrl-G
☆86Updated 6 months ago
yueqis / API-Based-Agent
☆51Updated 3 weeks ago
jonhue / activeft
PyTorch library for Active Fine-Tuning
☆87Updated 5 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆97Updated 2 months ago
writer / writing-in-the-margins
☆118Updated 10 months ago
allenai / discoverybench
Discovering Data-driven Hypotheses in the Wild
☆99Updated last month
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
pgasawa / BARE
Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation
☆33Updated 5 months ago
ai8hyf / OpenResearchAssistant
An automated tool for discovering insights from research papaer corpora
☆138Updated last year
sunblaze-ucb / reasoning_ladder
☆33Updated 2 months ago
HishamAlyahya / semantic_backprop
Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖
☆71Updated 7 months ago
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆245Updated 9 months ago
da03 / WildVisualizer
☆22Updated 3 weeks ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆95Updated 3 months ago
allenai / marg-reviewer
Code/data for MARG (multi-agent review generation)
☆44Updated 8 months ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆66Updated last month
ExtensityAI / benchmark
Evaluation of neuro-symbolic engines
☆38Updated 11 months ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated 3 months ago