strangeloopcanon / LOOP-Evals
Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids
☆17Updated last month
Alternatives and similar repositories for LOOP-Evals:
Users that are interested in LOOP-Evals are comparing it to the libraries listed below
- Understanding how features learned by neural networks evolve throughout training☆33Updated 5 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆37Updated last week
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆25Updated 3 months ago
- Few-shot Learning with Auxiliary Data☆27Updated last year
- PyTorch implementation for MRL☆18Updated last year
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 2 months ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- ☆17Updated last year
- ☆44Updated 4 months ago
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆23Updated 2 years ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆15Updated last year
- ☆34Updated last year
- Minimum Description Length probing for neural network representations☆19Updated last month
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆15Updated last year
- Based on the tree of thoughts paper☆47Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆37Updated last month
- ☆23Updated 6 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- Submission to the inverse scaling prize☆23Updated last year
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆32Updated 4 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago
- ☆25Updated 5 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆70Updated 3 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆47Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated last week
- Training hybrid models for dummies.☆20Updated 2 months ago