Zhiyuan-Zeng/EvalTree

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Zhiyuan-Zeng/EvalTree)

Zhiyuan-Zeng / EvalTree

[COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

☆31

Alternatives and similar repositories for EvalTree

Users that are interested in EvalTree are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / fluid-benchmarking
View on GitHub
Fluid Language Model Benchmarking
☆29Sep 16, 2025Updated 10 months ago
cxcscmu / Montessori-Instruct
View on GitHub
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
☆51Jan 24, 2025Updated last year
stellalisy / mediQ
View on GitHub
☆44Jan 26, 2025Updated last year
oriyor / turning_tables
View on GitHub
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Nov 2, 2021Updated 4 years ago
HazyResearch / wonderbread
View on GitHub
WONDERBREAD benchmark + dataset for BPM tasks
☆35Jul 30, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
gd-zhang / noisy-quadratic-model
View on GitHub
Large-batch Training, Neural Network Optimization
☆10Nov 8, 2019Updated 6 years ago
RulinShao / massive-serve
View on GitHub
Python package for serving a local search engine. One command to download and serve a datastore---that's it 😎.
☆26Jun 6, 2025Updated last year
peterbhase / ExplanationSearch
View on GitHub
Code for paper "Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals"
☆18Oct 17, 2022Updated 3 years ago
thunlp / Modularity-Analysis
View on GitHub
[ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers
☆26Jun 7, 2023Updated 3 years ago
sparkle-reasoning / sparkle
View on GitHub
[NeurIPS'25] Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
☆16Dec 12, 2025Updated 7 months ago
mickymultani / nvidia-NIM-RAG
View on GitHub
Project demonstrates the power and simplicity of NVIDIA NIM (NVIDIA Inference Model), a suite of optimized cloud-native microservices, by…
☆16Mar 21, 2024Updated 2 years ago
SLAB-NLP / Multi-Prompt-LLM-Evaluation
View on GitHub
State of What Art? A Call for Multi-Prompt LLM Evaluation
☆16Apr 10, 2026Updated 3 months ago
ozyyshr / ShareGPT_investigation
View on GitHub
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))
☆13Dec 21, 2023Updated 2 years ago
steveazzolin / gnn_logic_global_expl
View on GitHub
Official repository of GLGExplainer (ICLR2023)
☆21Jun 7, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ictnlp / TACS
View on GitHub
Source code for Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts
☆17Sep 2, 2024Updated last year
AIM3-RUC / VideoIC
View on GitHub
Danmuku dataset
☆12Jul 7, 2023Updated 3 years ago
nguyentthong / READ
View on GitHub
[AAAI’24 Main] READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Vi…
☆10Jan 24, 2025Updated last year
dqwang122 / CALMS
View on GitHub
Code and dataset for 'Contrastive Aligned Joint Learning for Multilingual Summarization'
☆13Mar 24, 2022Updated 4 years ago
kenchan0226 / FineGrainedFact
View on GitHub
Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarizatio…
☆15Jan 25, 2024Updated 2 years ago
successar / FRESH
View on GitHub
☆26Jun 12, 2023Updated 3 years ago
xuyige / FGRL4KG
View on GitHub
Source code for Findings of EMNLP 2021 paper ``Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning``
☆13Nov 9, 2021Updated 4 years ago
salesforce / MPT
View on GitHub
☆16Jun 12, 2023Updated 3 years ago
jiacheng-xu / lattice-generation
View on GitHub
Code for Massive-scale Decoding for Text Generation using Lattices
☆44Jul 29, 2022Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
alibaba-mmai-research / HiCo
View on GitHub
CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
☆18Aug 10, 2022Updated 3 years ago
uivision / UI-Vision
View on GitHub
☆33Jul 3, 2025Updated last year
thunlp / Seq2Seq-Prompt
View on GitHub
Source code for COLING 2022 paper "Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models"
☆24Sep 21, 2022Updated 3 years ago
peijunallin / alphalora
View on GitHub
☆19Nov 10, 2024Updated last year
yzjiao / RolePred
View on GitHub
Source code for EMNLP findings paper "Open-Vocabulary Argument Role Prediction for Event Extraction"
☆19Nov 5, 2022Updated 3 years ago
LeeSureman / E5-Retrieval-Reproduction
View on GitHub
Use contrastive learning to train a large language model (LLM) as a retriever
☆12Jul 19, 2024Updated 2 years ago
xiye17 / EvalQAExpl
View on GitHub
Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.
☆17Apr 25, 2021Updated 5 years ago
princeton-nlp / LLMBar
View on GitHub
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆138Jul 8, 2024Updated 2 years ago
allenai / signal-and-noise
View on GitHub
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆31Aug 19, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zorazrw / odex
View on GitHub
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆49Dec 22, 2023Updated 2 years ago
jmhessel / caption_contest_corpus
View on GitHub
Corpus to accompany: "Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest"
☆61Mar 19, 2025Updated last year
joeljang / FLM
View on GitHub
All-in-one repository for Fine-tuning & Pretraining (Large) Language Models
☆15Mar 8, 2023Updated 3 years ago
ajyl / mech_int_othelloGPT
View on GitHub
☆10Nov 6, 2024Updated last year
Heidelberg-NLP / CC-SHAP-VLM
View on GitHub
Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…
☆12Jul 14, 2026Updated 2 weeks ago
cylnlp / convsumx
View on GitHub
Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
☆19Mar 23, 2024Updated 2 years ago
Jhaprince / MultiBully
View on GitHub
☆17Oct 2, 2024Updated last year