benlipkin / probsemLinks
Probabilistic LLM evaluations. [CogSci2023; ACL2023]
☆73Updated 10 months ago
Alternatives and similar repositories for probsem
Users that are interested in probsem are comparing it to the libraries listed below
Sorting:
- Code repository for the c-BTM paper☆106Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆70Updated 2 years ago
- Experiments with generating opensource language model assistants☆97Updated 2 years ago
- Mechanistic Interpretability for Transformer Models☆51Updated 3 years ago
- ☆44Updated 6 months ago
- Measuring the situational awareness of language models☆35Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆48Updated last year
- Experiments for efforts to train a new and improved t5☆77Updated last year
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆44Updated last year
- ☆35Updated 2 years ago
- Public Inflection Benchmarks☆68Updated last year
- ☆74Updated last year
- ☆28Updated last year
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 4 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆43Updated 6 months ago
- A repository for transformer critique learning and generation☆90Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- ☆22Updated last year
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆33Updated 11 months ago
- ☆96Updated 3 months ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆55Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- Multi-Domain Expert Learning☆67Updated last year
- A domain-specific probabilistic programming language for modeling and inference with language models☆130Updated last month
- A library for squeakily cleaning and filtering language datasets.☆47Updated last year
- ☆36Updated 2 years ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆76Updated last year