benlipkin / probsem
Probabilistic LLM evaluations. [CogSci2023; ACL2023]
☆73Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for probsem
- ☆91Updated 7 months ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- Utilities for the HuggingFace transformers library☆61Updated last year
- Code repository for the c-BTM paper☆105Updated last year
- ☆46Updated last month
- ☆71Updated 6 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆38Updated 2 weeks ago
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- ☆99Updated 3 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆107Updated last year
- ☆99Updated this week
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆36Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated 11 months ago
- A domain-specific probabilistic programming language for modeling and inference with language models☆112Updated last year
- Experiments with generating opensource language model assistants☆97Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆84Updated 3 months ago
- ☆68Updated 2 months ago
- Factored Cognition Primer: How to write compositional language model programs☆48Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆91Updated last year
- ☆85Updated 5 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 6 months ago
- Erasing concepts from neural representations with provable guarantees☆208Updated 3 weeks ago
- Extract full next-token probabilities via language model APIs☆228Updated 8 months ago
- Evaluating LLMs with CommonGen-Lite☆84Updated 7 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆195Updated 4 months ago
- LILO: Library Induction with Language Observations☆78Updated 2 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆13Updated last week
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆61Updated last year