Su-Sea / ydc-deep-research-evalsLinks
you.com's framework for evaluating deep research systems.
☆18Updated 2 months ago
Alternatives and similar repositories for ydc-deep-research-evals
Users that are interested in ydc-deep-research-evals are comparing it to the libraries listed below
Sorting:
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆15Updated 10 months ago
- Fullstack chatbot application☆11Updated last week
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- ☆40Updated last year
- Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆42Updated 2 weeks ago
- A repository of projects and datasets under active development by Alignment Lab AI☆22Updated last year
- Notebooks for Tutorials from Marktechpost☆101Updated this week
- ☆27Updated 11 months ago
- The Foundation Model Transparency Index☆81Updated last year
- Streamlit app for recommending eval functions using prompt diffs☆28Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆35Updated 2 years ago
- Conversational AI tooling & personas built on Cohere's LLMs☆174Updated last year
- A sandbox repo for grounded question answering with Cohere and Google Search☆136Updated last year
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆78Updated 3 months ago
- Recipes and resources for building, deploying, and fine-tuning generative AI with Fireworks.☆117Updated last week
- Mixing Language Models with Self-Verification and Meta-Verification☆106Updated 7 months ago
- ☆166Updated last week
- Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network☆78Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 11 months ago
- ☆62Updated 2 months ago
- ☆47Updated last year
- A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models☆74Updated 4 months ago
- PageRank for LLMs☆43Updated 3 months ago
- ☆14Updated last year
- A specification for OpenInference, a semantic mapping of ML inferences☆47Updated last year
- Tutorial to get started with SkyPilot!☆58Updated last year
- ☆23Updated 2 years ago
- ☆30Updated 8 months ago
- PARIS (Perpetual Adaptive Regenerative Intelligence System) is a conceptual model for building and managing effective AI and Language Mod…☆24Updated 2 years ago