google / curieLinks
Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025
☆28Updated 7 months ago
Alternatives and similar repositories for curie
Users that are interested in curie are comparing it to the libraries listed below
Sorting:
- implementation of dualformer☆24Updated 9 months ago
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆78Updated 4 months ago
- Optimize Any User-defined Compound AI Systems☆63Updated 4 months ago
- Official Implementation of the Baby-AIGS system☆24Updated last year
- Defeating the Training-Inference Mismatch via FP16☆165Updated last month
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆58Updated 2 months ago
- ☆352Updated last month
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆109Updated 3 weeks ago
- A collection of resources and papers on AI Scientist / Robot Scientist☆116Updated 2 months ago
- SSRL: Self-Search Reinforcement Learning☆158Updated 4 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Updated last year
- A testbed for agents and environments that can automatically improve models through data generation.☆27Updated 9 months ago
- Official repository for the paper Number Cookbook: Number Understanding of Language Models and How to Improve It.☆19Updated 8 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆171Updated 3 months ago
- ☆226Updated 9 months ago
- Process Reward Models That Think☆64Updated 3 weeks ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- ☆24Updated 8 months ago
- Demystifying Reinforcement Learning in Agentic Reasoning☆133Updated 2 months ago
- [NeurIPS'24 LanGame workshop] On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆41Updated 5 months ago
- ☆41Updated 6 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆70Updated 11 months ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Updated 5 months ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆14Updated 5 months ago
- ☆34Updated 7 months ago
- ☆17Updated 4 months ago
- ☆51Updated 10 months ago
- ☆44Updated 5 months ago
- ☆35Updated 7 months ago
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆36Updated 2 months ago