guestrin-lab / deepscholar-benchLinks
benchmark and evaluate generative research synthesis
☆35Updated last week
Alternatives and similar repositories for deepscholar-bench
Users that are interested in deepscholar-bench are comparing it to the libraries listed below
Sorting:
- Source code for the collaborative reasoner research project at Meta FAIR.☆102Updated 4 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆161Updated 4 months ago
- accompanying material for sleep-time compute paper☆107Updated 4 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 9 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆120Updated 6 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆135Updated last year
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆209Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆182Updated 5 months ago
- ☆94Updated 5 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆96Updated last month
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆47Updated 4 months ago
- Automating enterprise workflows with multimodal agents☆110Updated 10 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆134Updated last week
- A framework for optimizing DSPy programs with RL☆154Updated this week
- ☆145Updated last year
- Train your own SOTA deductive reasoning model☆105Updated 6 months ago
- Training-Ready RL Environments + Evals☆65Updated this week
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 7 months ago
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆128Updated this week
- ☆212Updated 6 months ago
- ☆80Updated 3 weeks ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated last month
- ☆66Updated this week
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆25Updated last month
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆167Updated this week
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆92Updated 3 months ago
- Simple repository for training small reasoning models☆38Updated 7 months ago