friederrr / GHOSTSLinks
GHOSTS dataset
☆39Updated 2 years ago
Alternatives and similar repositories for GHOSTS
Users that are interested in GHOSTS are comparing it to the libraries listed below
Sorting:
- NaturalProofs: Mathematical Theorem Proving in Natural Language (NeurIPS 2021 Datasets & Benchmarks)☆134Updated 3 years ago
- A unified benchmark for math reasoning☆89Updated 3 years ago
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆85Updated 3 years ago
- EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443☆86Updated last year
- ☆49Updated 2 years ago
- Distributional Generalization in NLP. A roadmap.☆88Updated 3 years ago
- NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks☆20Updated 3 years ago
- NaturalProver: Grounded Mathematical Proof Generation with Language Models☆39Updated 2 years ago
- ☆114Updated 3 years ago
- ☆46Updated 3 years ago
- Neural Unification for Logic Reasoning over Language☆22Updated 4 years ago
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆76Updated 3 years ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆154Updated 4 months ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆27Updated 2 years ago
- Companion repo for "Evaluating Verifiability in Generative Search Engines".☆85Updated 2 years ago
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated 2 years ago
- ☆145Updated last year
- Code for paper "CrossFit : A Few-shot Learning Challenge for Cross-task Generalization in NLP" (https://arxiv.org/abs/2104.08835)☆113Updated 3 years ago
- ☆47Updated 2 years ago
- ☆29Updated last year
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆61Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆99Updated 4 years ago
- ☆141Updated 3 years ago
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆19Updated 2 years ago
- ☆59Updated last year
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆58Updated last year
- Repository for the code and dataset for the paper: "Have LLMs Advanced enough? Towards Harder Problem Solving Benchmarks For Large Langu…☆39Updated 2 years ago
- OpenPI dataset for tracking entities in open domain procedural text☆24Updated last year
- A Python Commonsense Knowledge Inference Toolkit☆64Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Updated 3 years ago