strangeloopcanon / LOOP-Evals
Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids
☆17Updated 2 months ago
Alternatives and similar repositories for LOOP-Evals:
Users that are interested in LOOP-Evals are comparing it to the libraries listed below
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- Factored Cognition Primer: How to write compositional language model programs☆48Updated 2 years ago
- A library for squeakily cleaning and filtering language datasets.☆47Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- ☆17Updated last year
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆15Updated last year
- ☆14Updated last year
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 3 months ago
- ☆13Updated last year
- Probabilistic LLM evaluations. [CogSci2023; ACL2023]☆73Updated 9 months ago
- Submission to the inverse scaling prize☆23Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated 2 years ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 weeks ago
- Understanding how features learned by neural networks evolve throughout training☆34Updated 6 months ago
- Byte-sized text games for code generation tasks on virtual environments☆19Updated 9 months ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆33Updated 11 months ago
- ☆22Updated last year
- ☆34Updated last year
- ☆26Updated 2 years ago
- A programming language for formal/informal computation.☆41Updated 2 weeks ago
- gzip Predicts Data-dependent Scaling Laws☆34Updated 11 months ago
- ☆44Updated 5 months ago
- A Python library for automatically solving Abstraction and Reasoning Corpus (ARC) challenges using Claude and object-centric modeling.☆21Updated 4 months ago
- Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)☆44Updated 3 years ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆16Updated last year
- Based on the tree of thoughts paper☆48Updated last year
- ☆48Updated last year
- ☆31Updated 2 years ago
- Ludwig benchmark☆20Updated 3 years ago
- LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).☆42Updated last year