benlipkin / probsem
Probabilistic LLM evaluations. [CogSci2023; ACL2023]
☆73Updated 6 months ago
Alternatives and similar repositories for probsem:
Users that are interested in probsem are comparing it to the libraries listed below
- Code repository for the c-BTM paper☆105Updated last year
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- ☆92Updated last month
- Utilities for the HuggingFace transformers library☆64Updated 2 years ago
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆69Updated last year
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- ☆45Updated 2 months ago
- A domain-specific probabilistic programming language for modeling and inference with language models☆114Updated last year
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆200Updated 2 weeks ago
- Evaluating LLMs with CommonGen-Lite☆88Updated 10 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆45Updated last year
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 7 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 3 weeks ago
- ☆22Updated last year
- gzip Predicts Data-dependent Scaling Laws☆33Updated 8 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆40Updated last year
- Extract full next-token probabilities via language model APIs☆229Updated 11 months ago
- Reimplementation of the task generation part from the Alpaca paper☆119Updated last year
- [Added T5 support to TRLX] A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆47Updated 2 years ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆47Updated last month
- ☆74Updated last year
- ☆81Updated 3 months ago
- ☆31Updated last year
- Experiments with generating opensource language model assistants☆97Updated last year
- ☆24Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆92Updated last year
- ☆72Updated 9 months ago
- Public Inflection Benchmarks☆69Updated 10 months ago