google / curieLinks
Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025
☆28Updated 8 months ago
Alternatives and similar repositories for curie
Users that are interested in curie are comparing it to the libraries listed below
Sorting:
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆78Updated 5 months ago
- implementation of dualformer☆24Updated 10 months ago
- Defeating the Training-Inference Mismatch via FP16☆172Updated last month
- AIRA-dojo: a framework for developing and evaluating AI research agents☆122Updated last month
- ☆18Updated 5 months ago
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆110Updated last month
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆118Updated 2 months ago
- ☆35Updated 7 months ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆214Updated last week
- Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆28Updated 3 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆27Updated 10 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆71Updated 11 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆63Updated this week
- ☆46Updated 6 months ago
- ☆24Updated 9 months ago
- A collection of resources and papers on AI Scientist / Robot Scientist☆117Updated 3 months ago
- Esoteric Language Models☆108Updated last month
- ☆34Updated 7 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆173Updated 3 months ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Updated 6 months ago
- ☆365Updated 2 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 5 months ago
- Process Reward Models That Think☆70Updated last month
- Demystifying Reinforcement Learning in Agentic Reasoning☆146Updated 2 months ago
- ☆88Updated 2 months ago
- Official repository for the paper Number Cookbook: Number Understanding of Language Models and How to Improve It.☆19Updated 9 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 7 months ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆118Updated 4 months ago
- UQ: Assessing Language Models on Unsolved Questions☆29Updated 4 months ago
- ☆41Updated 7 months ago