byungdoh / llm_surprisal
Surprisal calculation using HuggingFace LMs ("Frequency Explains the Inverse Correlation of Large Language Models’ Size, Training Data Amount, and Surprisal’s Fit to Reading Times," EACL24)
☆10Updated 6 months ago
Related projects: ⓘ
- ☆11Updated 2 years ago
- A psycholinguistic modeling toolkit☆24Updated last week
- ☆20Updated 3 years ago
- The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit☆55Updated last year
- ☆24Updated 4 months ago
- A neural language model that estimates incremental processing complexity☆38Updated 2 years ago
- Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference☆15Updated 4 years ago
- ☆23Updated 4 years ago
- Code and data for "A Systematic Assessment of Syntactic Generalization in Neural Language Models"☆24Updated 3 years ago
- Code and Results for "Universals of word order reflect optimization of grammars for efficient communication"☆12Updated 2 years ago
- Corpus of naturalistic stories with annotation and psycholinguistic measures☆49Updated 2 years ago
- Unsupervised Grammar Induction with Combinatory Categorial Grammars☆10Updated 3 years ago
- ☆19Updated 3 years ago
- Constituency parser for English and Chinese, built on the RNNG and In-Order parsers with BERT☆38Updated 4 years ago
- Scripts to evaluate scoped meaning representations☆18Updated 2 years ago
- Diagnostic tests for linguistic capacities in language models☆66Updated 2 years ago
- ☆37Updated 3 years ago
- ☆15Updated 7 months ago
- Scripts for large-scale prediction of lexical semantic change.☆12Updated last year
- ☆37Updated 3 years ago
- Analysis pipeline for Revisiting UID (EMNLP 2021)☆10Updated last year
- Code and CoarseWSD-20 datasets for "Language Models and Word Sense Disambiguation: An Overview and Analysis"☆23Updated 2 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- This repository houses the IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated se…☆19Updated 3 years ago
- Code for the paper "Measuring Bias in Contextualized Word Representations"☆35Updated 5 years ago
- Data and code repository of " Multilingual Fairness Evaluation for Hate Speech Detection ". LREC 2020.☆20Updated last year
- Evaluating recurrent neural networks on predicting subject-verb agreement dependencies☆61Updated last year
- ☆12Updated 2 years ago
- ☆28Updated last year
- A framework for nonlinear continuous-time regression☆31Updated 4 months ago