☆77Jan 24, 2025Updated last year
Alternatives and similar repositories for ChineseSimpleQA
Users that are interested in ChineseSimpleQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆35Jan 7, 2025Updated last year
- ☆46Mar 4, 2025Updated last year
- ☆17Mar 13, 2025Updated last year
- ☆151May 14, 2025Updated 10 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆60May 20, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Debug DeepSpeed-Chat step by step in IDE (在IDE里一步一步调试DeepSpeed-Chat)☆10Apr 17, 2023Updated 2 years ago
- Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.☆29Feb 19, 2024Updated 2 years ago
- ☆27Nov 20, 2023Updated 2 years ago
- ☆47Dec 12, 2024Updated last year
- [ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL☆15Oct 9, 2025Updated 6 months ago
- ☆19Nov 5, 2024Updated last year
- PyTorch implementation for NAACL 2022 paper: "Document-Level Relation Extraction with Sentences Importance Estimation and Focusing"☆17Apr 29, 2022Updated 3 years ago
- ☆52Aug 14, 2024Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆138Jun 5, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 9 months ago
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆117Feb 2, 2026Updated 2 months ago
- ☆35Mar 6, 2026Updated last month
- ☆15Apr 11, 2024Updated last year
- ☆21Jan 3, 2026Updated 3 months ago
- ☆62Oct 29, 2024Updated last year
- ☆31Feb 28, 2026Updated last month
- [MICCAI 2024] MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality☆12Sep 26, 2025Updated 6 months ago
- A package dedicated for running benchmark agreement testing☆18Sep 18, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆33Jan 26, 2026Updated 2 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Mar 6, 2025Updated last year
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆24Oct 22, 2025Updated 5 months ago
- Towards Systematic Measurement for Long Text Quality☆38Sep 5, 2024Updated last year
- ☆29Feb 27, 2026Updated last month
- Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks☆15Feb 17, 2025Updated last year
- ☆17Apr 6, 2022Updated 4 years ago
- ☆12Apr 10, 2023Updated 3 years ago
- ☆13May 22, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This repository contains the dataset and implementation details of the paper "An In-depth Analysis of Implicit and Subtle Hate Speech Mes…☆10May 9, 2024Updated last year
- ☆12Jan 5, 2025Updated last year
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- OWASP Foundation web repository☆17Oct 11, 2025Updated 5 months ago
- ☆12Nov 2, 2025Updated 5 months ago
- Code for MICCAI2023 paper: TransLiver: A Hybrid Transformer Model for Multi-phase Liver Lesion Classification☆18Jan 10, 2024Updated 2 years ago
- ☆15May 30, 2025Updated 10 months ago