zli12321/qa_metrics

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zli12321/qa_metrics)

zli12321 / qa_metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

☆61

Alternatives and similar repositories for qa_metrics

Users that are interested in qa_metrics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zli12321 / VideoHallu
View on GitHub
Synthetic Video hallucination and Mitigation
☆23Sep 21, 2025Updated 10 months ago
zli12321 / Vision-SR1
View on GitHub
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
☆175Mar 14, 2026Updated 4 months ago
zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
EnnengYang / Efficient-WEMoE
View on GitHub
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.
☆16Oct 28, 2024Updated last year
HelloEveryboby / Butler
View on GitHub
Butler 是一个用于自动化服务管理和任务调度的工具项目。
☆17Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
boyiwei / CoTaEval
View on GitHub
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Jul 17, 2024Updated 2 years ago
NJUPT-SAST / aurora-ui
View on GitHub
🌏 UI component library for the future, based on WebComponent.
☆23Nov 12, 2024Updated last year
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
OPTML-Group / SOUL
View on GitHub
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆30Oct 1, 2024Updated last year
Hongyang-Du / awesome-3d-datasets
View on GitHub
[CVPRW'26] A collection and survey of 3d dataset
☆33Jun 4, 2026Updated last month
1andrevich / antifilter-domain
View on GitHub
Generated geosite.dat based on Antifilter Community List
☆29Updated this week
jaechan-repo / muse_bench
View on GitHub
☆33Aug 9, 2024Updated last year
ezubaric / jbg-web
View on GitHub
Source code for Jordan Boyd-Graber's academic webpage.
☆12Jul 5, 2026Updated 2 weeks ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
phax / en16931-cii2ubl
View on GitHub
Converter for EN16931 invoices from CII to UBL
☆45Updated this week
Babelscape / ALERT
View on GitHub
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆60Sep 20, 2024Updated last year
AI21Labs / factor
View on GitHub
Code and data for the FACTOR paper
☆54Nov 15, 2023Updated 2 years ago
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
BAAI-WuDao / Data
View on GitHub
“悟道”数据
☆51Jul 5, 2021Updated 5 years ago
TCMAI-BJTU / LingdanLLM
View on GitHub
TCM Lingdan LLM
☆51Jun 1, 2026Updated last month
randalburns / jhupp-lectures
View on GitHub
Notebooks for JHU EN 601.320/420/620
☆10May 1, 2019Updated 7 years ago
DA-southampton / RedGPT
View on GitHub
☆70Apr 14, 2023Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
licong-lin / negative-preference-optimization
View on GitHub
☆76Jul 15, 2024Updated 2 years ago
lixiang-222 / CIDGMed
View on GitHub
Knowledge-Based System'25
☆11Dec 15, 2024Updated last year
alejandro-ao / crewai-crash-course
View on GitHub
Tutorial: Introduction to CrewAI
☆84Jul 5, 2024Updated 2 years ago
kstats / CausalQG
View on GitHub
☆15Apr 19, 2021Updated 5 years ago
ConiferLM / Conifer
View on GitHub
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
☆91Apr 4, 2024Updated 2 years ago
snap-stanford / MAG
View on GitHub
Programs for Microsoft Academic Graph
☆16Jun 8, 2016Updated 10 years ago
amazon-science / bold
View on GitHub
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper
☆88Mar 2, 2021Updated 5 years ago
PxYu / entity-expansion
View on GitHub
Corpus-based Set Expansion with Lexical Features and Distributed Representations (SIGIR '19)
☆13Jul 18, 2019Updated 7 years ago
Yale-LILY / FeTaQA
View on GitHub
Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"
☆90May 11, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sailing-lab / sr2am
View on GitHub
SR²AM: Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
☆21May 22, 2026Updated 2 months ago
wuxiyang1996 / AutoHallusion
View on GitHub
AutoHallusion Codebase (EMNLP 2024)
☆23Dec 6, 2024Updated last year
peterbhase / LAS-NL-Explanations
View on GitHub
Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"
☆21Oct 13, 2020Updated 5 years ago
codereport / hoogle-translate
View on GitHub
☆136May 7, 2026Updated 2 months ago
cxfann / Flame
View on GitHub
☆15May 19, 2026Updated 2 months ago
synlp / ChiMed-GPT
View on GitHub
ChiMed-GPT is a Chinese medical large language model (LLM) built by continually training Ziya-v2 on Chinese medical data, where pre-train…
☆106Dec 29, 2023Updated 2 years ago
HanGuo97 / lq-lora
View on GitHub
☆129Jan 22, 2024Updated 2 years ago