you.com's framework for evaluating deep research systems.
☆76May 15, 2025Updated last year
Alternatives and similar repositories for ydc-deep-research-evals
Users that are interested in ydc-deep-research-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A curated collection of papers and related projects on using LLMs for privacy.☆31Oct 8, 2025Updated 7 months ago
- Code repository for BEEP (Biomedical Evidence Enhanced Predictions) clinical outcome prediction system☆26Nov 8, 2023Updated 2 years ago
- ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry☆49Jan 5, 2026Updated 4 months ago
- Official implementation of Browse-Master, a tool-augmented web-search agent.☆31Aug 22, 2025Updated 9 months ago
- ☆10May 9, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Data and code for "A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization" (ACL 2020)☆48Jun 12, 2023Updated 2 years ago
- This project contains the original white paper for Language Construct Modeling (LCM) v1.13, authored by Vincent Shing Hin Chong. It intro…☆15Jul 23, 2025Updated 10 months ago
- ☆13May 7, 2026Updated 3 weeks ago
- 🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code ex…☆41Oct 20, 2025Updated 7 months ago
- ☆12Nov 5, 2024Updated last year
- Official implementation of Data Contamination Can Cross Language Barriers☆12Sep 11, 2024Updated last year
- A simple python wrapper for using the Caddy API☆27May 20, 2026Updated last week
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆730May 11, 2026Updated 2 weeks ago
- Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …☆19Dec 16, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [SIGIR24] Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval☆18Feb 29, 2024Updated 2 years ago
- Source code for "SimCKP: Simple Contrastive Learning of Keyphrase Representations", Findings of EMNLP 2023☆12Jun 20, 2025Updated 11 months ago
- Benchmarking LLMs and Agents in Rigorous Financial Analysis and Forecast☆24May 10, 2026Updated 2 weeks ago
- Code and datasets for the ACL 2020 paper "Detecting Perceived Emotions in Hurricane Disasters"☆12Oct 4, 2022Updated 3 years ago
- Forecastbench Datasets, updated nightly☆28May 21, 2026Updated last week
- ☆12Apr 24, 2024Updated 2 years ago
- 基于电商导购机器人,自然语言理解(NLU),文本纠错,歧义词消歧☆12May 5, 2020Updated 6 years ago
- Metaprompt is an AI-powered prompt generator developed by Anthropic. This is the unofficial Metaprompt Community Github repo. All PRs are…☆14Mar 19, 2024Updated 2 years ago
- A toolkit to induce interpretable workflows from raw computer-use activities.☆44Nov 13, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AkashOS is an unattended install of Ubuntu Server that will become the operating system of the machine. Akash OS will create a Kubernetes…☆13Dec 22, 2025Updated 5 months ago
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use☆30Nov 4, 2025Updated 6 months ago
- ☆19Oct 13, 2022Updated 3 years ago
- Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evo…☆100Jul 18, 2025Updated 10 months ago
- Source code for paper "PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration", Findings …☆11Jun 20, 2025Updated 11 months ago
- A command-line interface tool for creating, managing, and verifying Content Provenance and Authenticity (C2PA) manifests for machine lear…☆22May 22, 2026Updated last week
- Agent framework for generating a synthetic dataset. This will be raw CoT and Reflection output to be cleaned up by a later step.☆17Apr 11, 2025Updated last year
- Code and Data for "SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting""☆16Feb 2, 2024Updated 2 years ago
- A collection of Tiptap extensions, versioned and released independently.☆25May 3, 2026Updated 3 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- RAG methods, benchmarks, and toolkits☆19Nov 28, 2024Updated last year
- ☆14Oct 30, 2021Updated 4 years ago
- HashiCorp Vault Plugins for Redis Enterprise☆16May 13, 2026Updated 2 weeks ago
- AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and …☆11Feb 11, 2025Updated last year
- Filter dialog data with a simple entropy-based method (see ACL paper)☆14Oct 4, 2019Updated 6 years ago
- prediction markets -> llm -> news☆25Nov 10, 2025Updated 6 months ago
- Demonstrate using MCP with Pydantic AI framework☆14Mar 14, 2025Updated last year