google-deepmind / questbench
☆19Updated last week
Alternatives and similar repositories for questbench:
Users that are interested in questbench are comparing it to the libraries listed below
- Aioli: A unified optimization framework for language model data mixing☆25Updated 3 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆24Updated 2 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 5 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 10 months ago
- implementation of dualformer☆16Updated 2 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆55Updated 2 months ago
- ☆31Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- ☆15Updated last month
- ☆23Updated last month
- Efficient Scaling laws and collaborative pretraining.☆16Updated 3 months ago
- Minimum Description Length probing for neural network representations☆19Updated 3 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆32Updated 7 months ago
- ☆22Updated 6 months ago
- Learn online intrinsic rewards from LLM feedback☆37Updated 4 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- ☆18Updated 7 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆18Updated last week
- ☆27Updated 9 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 3 weeks ago
- Repository for Skill Set Optimization☆12Updated 9 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 2 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 7 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆48Updated 6 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆35Updated last week
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆33Updated 6 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆145Updated 3 months ago