MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and validity
☆12Nov 6, 2023Updated 2 years ago
Alternatives and similar repositories for MetricEval
Users that are interested in MetricEval are comparing it to the libraries listed below
Sorting:
- mcp wrapper for openai built-in tools☆12Mar 13, 2025Updated last year
- Constructing community of LLM-based Agent in the minecraft☆17Nov 27, 2025Updated 3 months ago
- A curated list of personalized Language model / Large language model (continually updated)☆10Nov 17, 2023Updated 2 years ago
- A curated list of resources dedicated to NLP (paper, blogs, note and etc)☆13Nov 30, 2019Updated 6 years ago
- ☆10Feb 16, 2025Updated last year
- ☆12Mar 22, 2024Updated last year
- A python implementation for computing the PoR metric for video summarization from "Performance over Random: A Robust Evaluation Protocol …☆10May 4, 2022Updated 3 years ago
- ☆13Mar 30, 2021Updated 4 years ago
- A light-weight version of rosdoc that does not rely on ROS infrastructure for crawling packages.☆10Apr 16, 2024Updated last year
- Code for the paper "What Makes Better Augmentation Strategies? Augment Difficult but Not too Different" (ICLR 22)☆12Aug 28, 2023Updated 2 years ago
- ☆19Nov 7, 2022Updated 3 years ago
- bigdata-bootcamp for graduate students in statistics at Seoul National University☆26Aug 26, 2025Updated 6 months ago
- Exploring limitations of LLM-as-a-judge☆20Aug 17, 2024Updated last year
- ☆16Nov 4, 2025Updated 4 months ago
- A tensorflow implementation of KGPL☆11Jan 1, 2021Updated 5 years ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆41Oct 17, 2023Updated 2 years ago
- Awesome LLM for NLG Evaluation Papers☆25Jan 23, 2024Updated 2 years ago
- Analyzing Latent Concept in Pre-trained Transformer Models☆12Jul 18, 2022Updated 3 years ago
- ☆11Apr 4, 2025Updated 11 months ago
- Implementation for https://arxiv.org/abs/2005.00652☆28Dec 8, 2022Updated 3 years ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆30Nov 25, 2021Updated 4 years ago
- Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs☆14Feb 10, 2026Updated last month
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- ☆10Sep 17, 2022Updated 3 years ago
- Code release for "TempLM: Distilling Language Models into Template-Based Generators"☆14Jul 21, 2022Updated 3 years ago
- It is tiny-dnn based on libtorch. Only headers without dependencies other than libtorch, deep learning framework☆37Nov 21, 2024Updated last year
- Fortifying Toxic Speech Detectors Against Veiled Toxicity☆11Oct 21, 2020Updated 5 years ago
- Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped…☆25Oct 8, 2025Updated 5 months ago
- A Corpus of Natural Language Instructions for Collaborative Manipulation☆15Feb 15, 2017Updated 9 years ago
- The tkPDFViewer is python library developed by Roshan Paswan, which allows you to embed the PDF file in your tkinter GUI.☆13Dec 4, 2022Updated 3 years ago
- Debug DeepSpeed-Chat step by step in IDE (在IDE里一步一步调试DeepSpeed-Chat)☆10Apr 17, 2023Updated 2 years ago
- Code for the paper entitled "Towards Driving-Oriented Metric for Lane Detection Models" (CVPR 2022)☆25Mar 19, 2022Updated 4 years ago
- ☆39Jun 7, 2023Updated 2 years ago
- codes for Uncovering Hidden Challenges in Query-Based Video Moment Retrieval☆20Sep 7, 2020Updated 5 years ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆87Sep 12, 2024Updated last year
- Dataset from Tip of the Tongue Known-Item Retrieval (2021) paper.☆12Nov 4, 2021Updated 4 years ago
- Waf extension for writing computational experiments.☆43Oct 17, 2015Updated 10 years ago
- A list of all named GANs!☆22Aug 4, 2017Updated 8 years ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆78Jul 18, 2025Updated 8 months ago