lz1oceani / verify_cot
☆130Updated last year
Alternatives and similar repositories for verify_cot:
Users that are interested in verify_cot are comparing it to the libraries listed below
- ☆172Updated last year
- Self-Alignment with Principle-Following Reward Models☆154Updated 11 months ago
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆195Updated last year
- ☆120Updated 8 months ago
- Simple next-token-prediction for RLHF☆222Updated last year
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆157Updated 9 months ago
- Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"☆94Updated last year
- Official Implementation of InstructZero; the first framework to optimize bad prompts of ChatGPT(API LLMs) and finally obtain good prompts…☆187Updated 6 months ago
- ☆150Updated last year
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆305Updated 5 months ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆218Updated last year
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - NeurIPS 2023 (oral)☆256Updated 10 months ago
- A codebase for "Language Models can Solve Computer Tasks"☆232Updated 9 months ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆229Updated last year
- ☆117Updated 4 months ago
- ☆81Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆103Updated last week
- ☆114Updated 7 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆70Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆77Updated 6 months ago
- The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset☆157Updated 9 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆208Updated 9 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆172Updated 3 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆281Updated 9 months ago
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debate☆403Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 5 months ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆204Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆93Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆215Updated 10 months ago
- ☆160Updated last year