youdotcom-oss / ydc-deep-research-evalsLinks
you.com's framework for evaluating deep research systems.
☆48Updated 5 months ago
Alternatives and similar repositories for ydc-deep-research-evals
Users that are interested in ydc-deep-research-evals are comparing it to the libraries listed below
Sorting:
- Official Repo for CRMArena and CRMArena-Pro☆119Updated 4 months ago
- ☆43Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆68Updated last year
- ☆58Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆59Updated 5 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆99Updated this week
- ☆80Updated 2 weeks ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated 10 months ago
- ☆78Updated last week
- The first dense retrieval model that can be prompted like an LM☆89Updated 5 months ago
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆36Updated last week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 10 months ago
- A method for steering llms to better follow instructions☆55Updated 2 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 3 years ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆57Updated 3 months ago
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆53Updated 2 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆169Updated this week
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- ☆96Updated 7 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆24Updated this week
- ☆58Updated 4 months ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- Verifiers for LLM Reinforcement Learning☆77Updated 6 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆50Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆92Updated 3 weeks ago