MikeGu721/XiezhiBenchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MikeGu721/XiezhiBenchmark)

MikeGu721 / XiezhiBenchmark

☆98

Alternatives and similar repositories for XiezhiBenchmark

Users that are interested in XiezhiBenchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Abbey4799 / CuteGPT
View on GitHub
An open-source conversational language model developed by the Knowledge Works Research Laboratory at Fudan University.
☆64Oct 12, 2023Updated 2 years ago
siyuyuan / coscript
View on GitHub
Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning
☆36Aug 19, 2023Updated 2 years ago
Felixgithub2017 / CG-Eval
View on GitHub
Chinese Generation Evaluation
☆13Aug 14, 2023Updated 2 years ago
tjunlp-lab / M3KE
View on GitHub
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆106Jul 20, 2023Updated 3 years ago
MikeGu721 / AgentGroup
View on GitHub
☆95Mar 26, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Felixgithub2017 / MMCU
View on GitHub
MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING
☆90Mar 24, 2024Updated 2 years ago
OpenMOSS / HalluQA
View on GitHub
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
☆139Jun 5, 2024Updated 2 years ago
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
flageval-baai / FlagEval
View on GitHub
FlagEval is an evaluation toolkit for AI large foundation models.
☆338Apr 24, 2025Updated last year
Judenpech / MLEC-QA
View on GitHub
Data and baseline code of EMNLP 2021 paper "MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset".
☆32Nov 5, 2021Updated 4 years ago
haonan-li / CMMLU
View on GitHub
CMMLU: Measuring massive multitask language understanding in Chinese
☆829Dec 6, 2024Updated last year
hkust-nlp / ceval
View on GitHub
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
☆1,862Jul 27, 2025Updated 11 months ago
THU-KEG / KoLA
View on GitHub
[ICLR24] The open-source repo of THU-KEG's KoLA benchmark.
☆57Sep 28, 2023Updated 2 years ago
Abbey4799 / PLMs-Interpret-Simile
View on GitHub
Code and datasets for the paper "Can Pre-trained Language Models Interpret Similes as Smart as Human?" (ACL 2022)
☆14Jan 4, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenLMLab / GAOKAO-Bench
View on GitHub
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
☆780Jan 7, 2025Updated last year
multimodal-art-projection / CodeCriticBench
View on GitHub
☆16Nov 1, 2025Updated 8 months ago
oceanypt / Court-View-Gen
View on GitHub
Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions
☆15May 7, 2018Updated 8 years ago
MikeGu721 / CS_arxiv_everyweek
View on GitHub
Weekly update the Computer Science Paper upload to arxiv.
☆106Feb 13, 2026Updated 5 months ago
Xwin-LM / Xwin-LM
View on GitHub
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
☆1,037May 31, 2024Updated 2 years ago
sufengniu / RefGPT
View on GitHub
☆164Apr 17, 2023Updated 3 years ago
tuhinjubcse / SimileGeneration-EMNLP2020
View on GitHub
Code for SCOPE (Style transfer through COmmonsense PropErty) , a style transfer approach to convert literal sentences to similes
☆19Apr 18, 2021Updated 5 years ago
THUDM / AlignBench
View on GitHub
大模型多维度中文对齐评测基准 (ACL 2024)
☆430Oct 25, 2025Updated 9 months ago
AI-EDU-LAB / E-EVAL
View on GitHub
Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.
☆32Feb 19, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
open-compass / LawBench
View on GitHub
Benchmarking Legal Knowledge of Large Language Models
☆441Nov 13, 2023Updated 2 years ago
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆417Jun 25, 2025Updated last year
luyuntao92 / ChatLLM-research
View on GitHub
☆21Sep 12, 2023Updated 2 years ago
Alibaba-NLP / CDQA
View on GitHub
CDQA: Chinese Dynamic Question Answering Benchmark
☆17Dec 13, 2024Updated last year
Abbey4799 / CELLO
View on GitHub
Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)
☆51Apr 19, 2024Updated 2 years ago
chenxran / Orion
View on GitHub
[NeurIPS 2021] Open Rule Induction
☆19May 22, 2022Updated 4 years ago
WeOpenML / PandaLM
View on GitHub
☆926May 22, 2024Updated 2 years ago
Charrrrrlie / Mask-as-Supervision
View on GitHub
[ECCV2024] The official repository of the paper "Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estima…
☆18Nov 21, 2024Updated last year
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,235Updated this week
ExpressAI / reStructured-Pretraining
View on GitHub
reStructured Pre-training
☆99Dec 22, 2022Updated 3 years ago
OpenLMLab / MOSS_Vortex
View on GitHub
Moss Vortex is a lightweight and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a weal…
☆37Apr 25, 2023Updated 3 years ago
CLUEbenchmark / SuperCLUE
View on GitHub
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
☆3,297Feb 6, 2026Updated 5 months ago
mutonix / RefGPT
View on GitHub
☆98Mar 20, 2024Updated 2 years ago
RUCAIBox / Slow_Thinking_with_LLMs
View on GitHub
A series of technical report on Slow Thinking with LLM
☆767Aug 13, 2025Updated 11 months ago
nanduan / NLPCC-KBQA
View on GitHub
NLPCC-KBQA Dataset
☆15Dec 7, 2021Updated 4 years ago