facebookresearch / irt-leaderboardLinks
Leaderboards are widely used in NLP and push the field forward. While leaderboards are a straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items (examples) and subjects (NLP models). Rather than replace leaderboards, we advocate a re-imagining so that they better highlight if and where progress is made. Buildi…
☆17Updated 3 years ago
Alternatives and similar repositories for irt-leaderboard
Users that are interested in irt-leaderboard are comparing it to the libraries listed below
Sorting:
- Bayesian Assessment of Hypotheses☆24Updated last year
- Author implementation of the paper "Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing"☆20Updated 4 years ago
- Replication code for "With Little Power Comes Great Responsibility"☆39Updated 4 years ago
- Chu-Lui-Edmonds decoding extracted from TurboParser☆14Updated 8 years ago
- Python code for training models in the ACL paper, "Simple and Effective Paraphrastic Similarity from Parallel Translations".☆22Updated 5 years ago
- ☆29Updated last year
- Defeasible Natural Language Inference☆12Updated 4 years ago
- ☆12Updated 4 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31Updated 5 years ago
- ☆24Updated 5 years ago
- TextGraphs-13 Shared Task on Multi-Hop Inference Explanation Regeneration☆44Updated 5 years ago
- Zero-Shot Open Entity Typing as Type-Compatible Grounding, EMNLP'18.☆42Updated 5 years ago
- ☆19Updated 5 years ago
- Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.☆50Updated 3 years ago
- Post-editing Datasets by Rakuten (PEDRa)☆14Updated 4 years ago
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Updated 5 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- Code for our ACL '20 paper "Representation Engineering with Natural Language Explanations"☆29Updated 5 years ago
- Hyperparameter search for AllenNLP - powered by Ray TUNE☆28Updated 3 months ago
- The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit☆57Updated 2 years ago
- ☆13Updated 4 years ago
- Code for ModularQA☆28Updated 4 years ago
- Workshop Home Page for Benchmarking: Past, Present and Future☆35Updated 3 years ago
- ☆27Updated 2 years ago
- A Mechanical Turk Interface (amti) 🤖☆56Updated last year
- Pytorch Seq2Seq framework☆27Updated 8 months ago
- ☆46Updated 5 years ago
- Syntactic evaluation sets, attribute-varying grammars, and code for replicating the CLAMS paper. ACL 2020.☆16Updated 7 months ago
- ☆11Updated 9 years ago
- MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale☆14Updated 4 years ago