OpenDFM / SciEval
[AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
☆27Updated 6 months ago
Alternatives and similar repositories for SciEval:
Users that are interested in SciEval are comparing it to the libraries listed below
- ☆111Updated 7 months ago
- Structured Chemistry Reasoning with Large Language Models☆32Updated 9 months ago
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆78Updated 11 months ago
- ☆64Updated 2 weeks ago
- Pre-trained Language Model for Scientific Text☆44Updated last year
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆47Updated this week
- Code implementation of synthetic continued pretraining☆88Updated last month
- A trainable user simulator☆34Updated 5 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆67Updated last month
- ☆54Updated 3 months ago
- ☆35Updated 4 months ago
- ☆89Updated 2 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆77Updated 6 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆44Updated 7 months ago
- ☆30Updated last year
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆27Updated 9 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆175Updated 4 months ago
- ☆251Updated last year
- ☆13Updated 4 months ago
- ☆30Updated 5 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆59Updated 3 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆84Updated 4 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆123Updated last month
- What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks☆137Updated 6 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆42Updated 4 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆154Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 3 months ago