KwanWaiChung / M4LE
Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
☆22Updated 5 months ago
Alternatives and similar repositories for M4LE:
Users that are interested in M4LE are comparing it to the libraries listed below
- ☆47Updated 9 months ago
- ☆33Updated 2 years ago
- ☆60Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆55Updated 6 months ago
- Towards Systematic Measurement for Long Text Quality☆31Updated 4 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆30Updated last year
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆53Updated 9 months ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆63Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning☆35Updated last year
- Provides a minimal implementation to extract FLAN datasets for further processing☆11Updated last year
- ☆16Updated 10 months ago
- ☆52Updated 4 months ago
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆35Updated 3 months ago
- Methods and evaluation for aligning language models temporally☆27Updated 10 months ago
- ☆84Updated 2 years ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- ☆28Updated last year
- [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.☆95Updated last year
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆20Updated 2 months ago
- Analyzing LLM Alignment via Token distribution shift☆14Updated 11 months ago
- ☆26Updated 3 weeks ago
- 🩺 A collection of ChatGPT evaluation reports on various bechmarks.☆48Updated last year
- [EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"☆34Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆154Updated 6 months ago
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆74Updated last year
- ☆21Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆58Updated last year
- ☆81Updated last year
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated last year