OpenDFM / SciEval
[AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
☆25Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for SciEval
- Structured Chemistry Reasoning with Large Language Models☆31Updated 6 months ago
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆67Updated 8 months ago
- ☆103Updated 4 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆33Updated 10 months ago
- Pre-trained Language Model for Scientific Text☆42Updated 9 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆93Updated last week
- Code implementation of synthetic continued pretraining☆60Updated last month
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆69Updated last month
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆35Updated 7 months ago
- ☆89Updated 11 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆73Updated 8 months ago
- ☆39Updated last month
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- ☆56Updated 9 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆54Updated last week
- Benchmarking Agentic Workflow Generation☆29Updated 2 weeks ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆97Updated 7 months ago
- ☆20Updated last month
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆73Updated 3 months ago
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes☆19Updated this week
- A trainable user simulator☆28Updated 2 months ago
- The code and data for the paper JiuZhang3.0☆35Updated 5 months ago
- Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆64Updated last week
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- A method of ensemble learning for heterogeneous large language models.☆30Updated 3 months ago
- ☆62Updated 3 weeks ago
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆69Updated last month
- ☆72Updated 5 months ago