Awesome LLM Benchmarks to evaluate the LLMs across text, code, image, audio, video and more.
☆160Jan 3, 2024Updated 2 years ago
Alternatives and similar repositories for awesome-LLM-benchmarks
Users that are interested in awesome-LLM-benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open foundation models, such LLama2, ChatGLM, etc.☆119Sep 18, 2024Updated last year
- Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…☆617Nov 24, 2025Updated 4 months ago
- 中国大模型☆6,417Nov 30, 2024Updated last year
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- 面向大模型的民族文化数据集☆12May 26, 2025Updated 9 months ago
- A complete guide to evaluate LLMs and RAGs. Both theory and code based approaches covered.☆29Nov 16, 2023Updated 2 years ago
- Dataset and codes for SEntFiN☆10May 31, 2023Updated 2 years ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Feb 15, 2024Updated 2 years ago
- Dilation Gate CNN For Machine Reading Comprehension☆17Mar 24, 2023Updated 2 years ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆29Aug 19, 2025Updated 7 months ago
- Scorer for grammatical error correction systems.☆14Feb 24, 2016Updated 10 years ago
- Machine learning methods for spatially resolved transcriptomics with histology images: a collection of related resources.☆10Mar 22, 2022Updated 4 years ago
- FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability☆15Nov 2, 2024Updated last year
- SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese☆3,279Feb 6, 2026Updated last month
- Official Implementation of NIPS 2022 paper Pre-activation Distributions Expose Backdoor Neurons☆15Jan 13, 2023Updated 3 years ago
- Qwen-Efficient-Tuning☆44Aug 16, 2023Updated 2 years ago
- This is the official repository of the paper Exploring Superior Function Calls via Reinforcement Learning.☆34Aug 11, 2025Updated 7 months ago
- ☆57Mar 14, 2026Updated last week
- Code for ICLR'2021 paper: On Dyadic Fairness: Exploring and Mitigating Bias in Graph Connections☆12Jul 17, 2021Updated 4 years ago
- One-Class Convolutional Neural Network pytorch实现,后续还会继续优化!!!!☆13Oct 27, 2022Updated 3 years ago
- FlagEval is an evaluation toolkit for AI large foundation models.☆338Apr 24, 2025Updated 11 months ago
- [ Text Analytics ] 법률 도메인 특화 한국어 기반 LLM 개발☆15Sep 14, 2025Updated 6 months ago
- ☆14Dec 13, 2021Updated 4 years ago
- Neural combinatorial optimization with equivariant quantum circuits.☆12May 13, 2022Updated 3 years ago
- ☆18Feb 20, 2025Updated last year
- 东南大学多模态知识图谱-OpenRichpedia工程文件☆29Aug 28, 2021Updated 4 years ago
- Evolutionary Multi-objective Optimization based Neural Architecture Search for Cognitive Diagnosis☆12Sep 5, 2024Updated last year
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Oct 1, 2024Updated last year
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation☆14Sep 2, 2024Updated last year
- Lightweight and Flexible Library for Creating Agents and Multi-Agent Conversations 🤖☆29Updated this week
- ☆15Dec 26, 2017Updated 8 years ago
- OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …☆6,788Updated this week
- pytorch attentional NMT(with NLL, MRT, REINFORCE, MIXER training objectives)☆13May 12, 2017Updated 8 years ago
- The replication package of "CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back""☆13Feb 8, 2023Updated 3 years ago
- ☆22Sep 20, 2022Updated 3 years ago
- 石油领域大语言模型☆17Feb 22, 2024Updated 2 years ago
- Fetching confused chars, including same pronunciation, similar pronunciation and similar character pattern☆20Jan 20, 2023Updated 3 years ago
- GPTCloneBench is a clone detection benchmark based on SemanticCloneBench and GPT.☆16Feb 5, 2025Updated last year
- Unofficial Implementation of Evolutionary Model Merging☆41Mar 28, 2024Updated last year