OpenDFM / SciEval
[AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
☆29Updated 7 months ago
Alternatives and similar repositories for SciEval:
Users that are interested in SciEval are comparing it to the libraries listed below
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆74Updated 3 months ago
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆78Updated last year
- ☆85Updated 3 weeks ago
- ☆116Updated 8 months ago
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆14Updated 2 weeks ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆78Updated 3 weeks ago
- A curated list of papers on LLMs and agents for scientific research and development☆46Updated 3 months ago
- Pre-trained Language Model for Scientific Text☆44Updated last year
- Structured Chemistry Reasoning with Large Language Models☆35Updated 10 months ago
- Code implementation of synthetic continued pretraining☆95Updated 2 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆67Updated last month
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆31Updated 10 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆46Updated 9 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆64Updated last week
- ☆38Updated 5 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆104Updated last week
- A trainable user simulator☆34Updated 6 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated 2 weeks ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆103Updated 2 weeks ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆91Updated 6 months ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆26Updated 3 months ago
- ☆61Updated 4 months ago
- ☆31Updated last year
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆118Updated 4 months ago
- ☆35Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 5 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆36Updated 11 months ago