ExtensityAI / benchmark
Evaluation of neuro-symbolic engines
☆33Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for benchmark
- ☆68Updated 3 months ago
- Extending Conformal Prediction to LLMs☆58Updated 5 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- Understanding how features learned by neural networks evolve throughout training☆31Updated 3 weeks ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- Data and code for the Corr2Cause paper (ICLR 2024)☆88Updated 7 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆62Updated 5 months ago
- ☆28Updated last year
- ☆44Updated last year
- Sparse and discrete interpretability tool for neural networks☆55Updated 9 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- ☆26Updated last year
- Universal Neurons in GPT2 Language Models☆27Updated 5 months ago
- Experiments on GPT-3's ability to fit numerical models in-context.☆14Updated 2 years ago
- ☆24Updated 4 months ago
- Discovering Data-driven Hypotheses in the Wild☆41Updated this week
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- ☆49Updated 6 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆19Updated 5 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 10 months ago
- This is the official repository for all the code of TheoremLlama☆32Updated last month
- Minimum Description Length probing for neural network representations☆16Updated last week
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆98Updated 5 months ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆34Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated 3 weeks ago
- ☆30Updated 2 weeks ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆25Updated 5 months ago
- ☆55Updated last month
- Understanding the correlation between different LLM benchmarks☆29Updated 10 months ago
- ☆70Updated last year