ExtensityAI / benchmarkLinks

Evaluation of neuro-symbolic engines

☆38

Alternatives and similar repositories for benchmark

Users that are interested in benchmark are comparing it to the libraries listed below

Sorting:

KaiNylund / lm-weights-encode-time
☆69Updated 11 months ago
KihoPark / LLM_Categorical_Hierarchical_Representations
☆104Updated 5 months ago
probabilistic-inference-scaling / probabilistic-inference-scaling
☆51Updated 4 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆36Updated 9 months ago
google-deepmind / mishax
☆136Updated 4 months ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
METR / RE-Bench
☆95Updated 3 months ago
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated 11 months ago
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
EdinburghNLP / torch-adaptive-imle
☆35Updated 8 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆225Updated 2 weeks ago
jonhue / activeft
PyTorch library for Active Fine-Tuning
☆87Updated 5 months ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆77Updated 8 months ago
keyonvafa / world-model-evaluation
☆61Updated 8 months ago
probcomp / LLaMPPL
A domain-specific probabilistic programming language for modeling and inference with language models
☆132Updated 3 months ago
brantondemoss / GrokkingComplexity
Code for
☆27Updated 7 months ago
microsoft / stop
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
☆44Updated last year
danielmamay / grokking
Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.
☆38Updated last year
victorvikram / ConceptARC
Materials for ConceptARC paper
☆98Updated 8 months ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
likenneth / q_probe
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆41Updated last year
joshuacnf / Ctrl-G
☆87Updated 6 months ago
allenai / infinigram-api
☆73Updated 2 weeks ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated 3 months ago
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆41Updated 9 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
allenai / discoverybench
Discovering Data-driven Hypotheses in the Wild
☆104Updated last month