Felixgithub2017 / MMCU
MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING
☆88Updated 11 months ago
Alternatives and similar repositories for MMCU:
Users that are interested in MMCU are comparing it to the libraries listed below
- 中文大语言模型评测第二期☆70Updated last year
- 中文大语言模型评测第一期☆107Updated last year
- ☆173Updated last year
- ☆159Updated last year
- ☆95Updated last year
- ☆59Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆119Updated 8 months ago
- ☆132Updated 10 months ago
- ☆128Updated last year
- ☆97Updated 11 months ago
- A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark☆100Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆112Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆39Updated 11 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆74Updated 3 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆47Updated 10 months ago
- ☆140Updated 8 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆137Updated 8 months ago
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆153Updated last year
- Source code for ACL 2023 paper Decoder Tuning: Efficient Language Understanding as Decoding☆48Updated last year
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆67Updated last week
- NTK scaled version of ALiBi position encoding in Transformer.☆67Updated last year
- 中文图书语料MD5链接☆216Updated last year
- OPD: Chinese Open-Domain Pre-trained Dialogue Model☆75Updated last year
- 零样本学习测评基准,中文版☆54Updated 3 years ago
- deepspeed+trainer简单高效实现多卡微调大模型☆122Updated last year
- make LLM easier to use☆59Updated last year
- 怎么训练一个LLM分词器☆141Updated last year
- 中文 Instruction tuning datasets☆127Updated 10 months ago
- Light local website for displaying performances from different chat models.☆85Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year