GAIR-NLP / lm-open-science-evaluationView external linksLinks
Reproducible and flexible LLM evaluations for scientific reasoning.
☆26Jul 23, 2025Updated 6 months ago
Alternatives and similar repositories for lm-open-science-evaluation
Users that are interested in lm-open-science-evaluation are comparing it to the libraries listed below
Sorting:
- Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"☆21Feb 17, 2025Updated 11 months ago
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆112Feb 2, 2026Updated last week
- ☆78May 22, 2024Updated last year
- ☆12Jun 19, 2024Updated last year
- 💀 gigasmol: a lightweight wrapper for gigachat api model for seamless use with smolagents.☆15Oct 23, 2025Updated 3 months ago
- ☆24Aug 19, 2025Updated 5 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated last month
- [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a Pretraining Objective☆232Jan 26, 2026Updated 2 weeks ago
- “中国光谷·华为杯”第十九届中国研究生数学建模竞赛(2022年)☆10Jul 9, 2023Updated 2 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 9 months ago
- ☆11Jun 4, 2021Updated 4 years ago
- Risky Object Localization (ROL) in a Driving Scene Dataset