benlipkin / probsem
Probabilistic LLM evaluations. [CogSci2023; ACL2023]
☆73Updated 7 months ago
Alternatives and similar repositories for probsem:
Users that are interested in probsem are comparing it to the libraries listed below
- Code repository for the c-BTM paper☆105Updated last year
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- ☆44Updated 3 months ago
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆18Updated last year
- Experiments with generating opensource language model assistants☆97Updated last year
- Public Inflection Benchmarks☆68Updated 11 months ago
- ☆93Updated 2 months ago
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- ☆33Updated last year
- Multi-Domain Expert Learning☆67Updated last year
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆41Updated last year
- LLM sampling method for enforcing syntax adherence in generated output☆23Updated last year
- ☆22Updated last year
- One stop shop for all things carp☆59Updated 2 years ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆46Updated last year
- ☆73Updated 10 months ago
- ☆45Updated 11 months ago
- ☆84Updated 2 weeks ago
- A domain-specific probabilistic programming language for modeling and inference with language models☆116Updated last year
- A library for squeakily cleaning and filtering language datasets.☆46Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆69Updated last year
- ☆80Updated last month
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆94Updated 2 years ago
- A repository for transformer critique learning and generation☆88Updated last year
- Measuring the situational awareness of language models☆34Updated last year
- Reasoning by Communicating with Agents☆25Updated 4 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated 11 months ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated last year
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated last month