youdotcom-oss / ydc-deep-research-evalsLinks
you.com's framework for evaluating deep research systems.
☆62Updated 7 months ago
Alternatives and similar repositories for ydc-deep-research-evals
Users that are interested in ydc-deep-research-evals are comparing it to the libraries listed below
Sorting:
- Official Repo for CRMArena and CRMArena-Pro☆127Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- ☆42Updated last year
- ☆59Updated last year
- The first dense retrieval model that can be prompted like an LM☆90Updated 8 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆65Updated last year
- Streamline on-policy/off-policy distillation workflows in a few lines of code☆87Updated this week
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆250Updated this week
- ☆92Updated 3 weeks ago
- ☆104Updated 9 months ago
- Verifiers for LLM Reinforcement Learning☆79Updated 8 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆118Updated 2 months ago
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 5 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆236Updated 7 months ago
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆86Updated 9 months ago
- Track the progress of LLM context utilisation☆55Updated 8 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆111Updated 8 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆27Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Updated last year
- ☆63Updated 6 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆126Updated 2 years ago
- LLM reads a paper and produce a working prototype☆60Updated 9 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆111Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 11 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆59Updated 5 months ago
- LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).☆48Updated 2 years ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆49Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆107Updated 3 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆101Updated 2 years ago