wenhuchen / TheoremQA
The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset
☆157Updated 11 months ago
Alternatives and similar repositories for TheoremQA:
Users that are interested in TheoremQA are comparing it to the libraries listed below
- ☆122Updated 4 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆136Updated 4 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆157Updated 10 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Updated last year
- ☆159Updated 2 years ago
- Self-Alignment with Principle-Following Reward Models☆156Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆79Updated 7 months ago
- A unified benchmark for math reasoning☆87Updated 2 years ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- ☆83Updated 2 months ago
- ☆172Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆218Updated last year
- ☆115Updated 8 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆218Updated 4 months ago
- Simple next-token-prediction for RLHF☆222Updated last year
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆30Updated 10 months ago
- Pretraining Efficiently on S2ORC!☆158Updated 5 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆118Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Context☆108Updated 7 months ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆138Updated 5 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated last month
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆158Updated 9 months ago
- ☆138Updated last year
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆179Updated 5 months ago
- Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"☆94Updated last year
- contrastive decoding☆196Updated 2 years ago
- ☆120Updated 9 months ago
- Unofficial implementation of AlpaGasus☆90Updated last year
- This project studies the performance and robustness of language models and task-adaptation methods.☆147Updated 10 months ago
- Official implementation of paper "Autonomous Data Selection with Language Models for Mathematical Texts" (As Huggingface Daily Papers: ht…☆80Updated 4 months ago