YangLinyi / GLUE-XLinks

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

☆93

Alternatives and similar repositories for GLUE-X

Users that are interested in GLUE-X are comparing it to the libraries listed below

Sorting:

HSLiu-Initial / CtrlA
This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.
☆62Updated last year
yiyihum / da-code
[EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
☆78Updated 3 months ago
longyuewangdcu / Document-MT-LLM
☆102Updated 2 years ago
Yueeeeeeee / HRPO
[NeurIPS 2025] Hybrid Latent Reasoning via Reinforcement Learning
☆155Updated last month
zhuang-li / SCAR
[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Mode…
☆37Updated 2 months ago
S1s-Z / SCL-RAI
[COLING'22] Code for "SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER"
☆46Updated last year
MrYxJ / enhance_long
This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …
☆45Updated last year
bird-bench / livesqlbench
☆109Updated 3 weeks ago
Flitternie / GraphQ_IR
A Unified Intermediate Representation for Graph Query Languages
☆66Updated 2 years ago
Ablustrund / MPLSandbox
MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…
☆177Updated 6 months ago
chatsci / Aeiva
A general AI agent framework that can be adapted to various tasks and environments.
☆102Updated 8 months ago
jzhoubu / vsearch
An Extensible Framework for Retrieval-Augmented LLM Applications: Learning Relevance Beyond Simple Similarity.
☆39Updated 10 months ago
yaoching0 / GaC
☆50Updated last year
ShuaiLyu0110 / SQL-o1
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
☆191Updated 5 months ago
SHUMKASHUN / Plots
This repo contains my customised style python based plots for NLP papers, and includes my reproduction for my favourite papers' plots
☆40Updated last year
Rafa-zy / QLASS
☆52Updated 2 months ago
yileijin / PayAttn
Official Implementation of "Pay Attention to What You Need"
☆42Updated 8 months ago
S1s-Z / SANTA
[ACL'23] Code for "SANTA: Separate Strategies for Inaccurate and Incomplete Annotation Noise in Distantly-Supervised Named Entity Recogni…
☆40Updated 5 months ago
syr-cn / AutoRefine
[NeurIPS 2025 Poster] Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning
☆100Updated last week
Mercury7353 / PyBench
LLM Benchmark for Code
☆31Updated last year
IAAR-Shanghai / Grimoire
Grimoire is All You Need for Enhancing Large Language Models
☆117Updated last year
luo-junyu / RobustFT
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
☆42Updated 10 months ago
Davion-Liu / Awesome-Robustness-in-Information-Retrieval
A curated list of awesome papers related to adversarial attacks and defenses for information retrieval. If I missed any papers, feel free…
☆218Updated last year
IAAR-Shanghai / ICSFSurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…
☆169Updated 10 months ago
tapilot-crossing / tapilot_code
☆44Updated last year
heng840 / AMIG
Code of Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Ne…
☆26Updated last year
duguodong7 / Awesome-Knowledge-Fusion
A collection of papers related to knowledge fusion
☆59Updated last year
qianc62 / Corsair
Counterfactual-inference-based Text-classification Debiasing Framework.
☆82Updated 4 years ago
ColinLu50 / SafeDelta
The official code repo for "Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets" in ICML 2025.
☆56Updated 3 months ago
uw-nsl / TinyV
Your efficient and accurate answer verification system for RL training.
☆41Updated 4 months ago