LivingFutureLab/ChineseSimpleQA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LivingFutureLab/ChineseSimpleQA)

LivingFutureLab / ChineseSimpleQA

☆79

Alternatives and similar repositories for ChineseSimpleQA

Users that are interested in ChineseSimpleQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LivingFutureLab / ChineseSafetyQA
View on GitHub
☆36Jan 7, 2025Updated last year
LivingFutureLab / DeltaBench
View on GitHub
☆45Mar 4, 2025Updated last year
PALIN2018 / BrowseComp-ZH
View on GitHub
☆158May 14, 2025Updated last year
GAIR-NLP / benbench
View on GitHub
Benchmarking Benchmark Leakage in Large Language Models
☆61May 20, 2024Updated 2 years ago
multimodal-art-projection / CodeCriticBench
View on GitHub
☆16Nov 1, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
chenzen94 / debug-deepspeed-chat
View on GitHub
Debug DeepSpeed-Chat step by step in IDE (在IDE里一步一步调试DeepSpeed-Chat)
☆10Apr 17, 2023Updated 3 years ago
neulab / SWE-Playground
View on GitHub
Official Repository for "Training Versatile Coding Agents in Synthetic Environments"
☆22Jan 11, 2026Updated 6 months ago
CLUEbenchmark / SuperCLUE-Code3
View on GitHub
中文原生等级化代码能力测试基准
☆15Apr 11, 2024Updated 2 years ago
AI-EDU-LAB / E-EVAL
View on GitHub
Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.
☆32Feb 19, 2024Updated 2 years ago
aryopg / mmlu-redux
View on GitHub
☆32Nov 9, 2024Updated last year
tongzeliang / EvoPrompt
View on GitHub
☆13Feb 17, 2025Updated last year
OpenMOSS / HalluQA
View on GitHub
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
☆139Jun 5, 2024Updated 2 years ago
awslabs / rag-qa-arena
View on GitHub
☆53Aug 14, 2024Updated last year
zhiyiscs / MoRA
View on GitHub
[MICCAI 2024] MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality
☆14Sep 26, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
THUDM / LongReward
View on GitHub
☆63Oct 29, 2024Updated last year
horizon-llm / Think-RM
View on GitHub
[NeurIPS 2025] Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
☆17Nov 2, 2025Updated 8 months ago
IBM / benchbench
View on GitHub
A package dedicated for running benchmark agreement testing
☆19Sep 18, 2025Updated 10 months ago
benjaminocampo / ISHate
View on GitHub
This repository contains the dataset and implementation details of the paper "An In-depth Analysis of Implicit and Subtle Hate Speech Mes…
☆10May 9, 2024Updated 2 years ago
Trust4AI / ASTRAL
View on GitHub
Automated Safety Testing of Large Language Models
☆17Jan 31, 2025Updated last year
THU-KEG / R-Eval
View on GitHub
[KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
☆11Apr 9, 2024Updated 2 years ago
KKsimi / StoHisNet
View on GitHub
☆13Jan 5, 2025Updated last year
rainavyas / prepend_acoustic_attack
View on GitHub
Prepend universal audio attack segment to mute Whisper
☆41Jan 22, 2025Updated last year
sherrydoge / TransLiver
View on GitHub
Code for MICCAI2023 paper: TransLiver: A Hybrid Transformer Model for Multi-phase Liver Lesion Classification
☆18Jan 10, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kangreen0210 / LIME
View on GitHub
Accelerating the development of large multimodal models (LMMs) with lmms-eval
☆14Oct 14, 2024Updated last year
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
OpenLMLab / LongWanjuan
View on GitHub
Towards Systematic Measurement for Long Text Quality
☆39Sep 5, 2024Updated last year
Hongcheng-Gao / HAVEN
View on GitHub
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆25Oct 22, 2025Updated 8 months ago
WangZX-0630 / Road-Extraction-using-Swin-Transformer-and-CNN
View on GitHub
☆18Apr 6, 2022Updated 4 years ago
zzr728 / SAGAN
View on GitHub
☆13May 22, 2024Updated 2 years ago
THUNLP-AIPoet / CCPM
View on GitHub
☆43Aug 21, 2021Updated 4 years ago
ByteDance-Seed / WideSearch
View on GitHub
WideSearch: Benchmarking Agentic Broad Info-Seeking
☆147Oct 9, 2025Updated 9 months ago
Seeing-Fast-and-Slow / Seeing-Fast-and-Slow
View on GitHub
☆16May 28, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
google-research-datasets / maxm
View on GitHub
MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…
☆13Jan 16, 2024Updated 2 years ago
washing1127 / publicRepos_mnbvc
View on GitHub
☆11Apr 10, 2023Updated 3 years ago
aggiejiang / SWSR
View on GitHub
A new release of Chinese sexism dataset and lexicon
☆14May 23, 2023Updated 3 years ago
FrankYang-17 / Mavors
View on GitHub
☆16May 30, 2025Updated last year
SimpleVQA / SimpleVQA
View on GitHub
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
☆15Feb 20, 2025Updated last year
ECNU-ICALK / EduChat-Math
View on GitHub
[MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
☆55Oct 20, 2024Updated last year
icip-cas / SSO
View on GitHub
A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…
☆20Nov 21, 2024Updated last year