ekinakyurek / gpt3-arithmetic
Scratchpad/Chain-of-Thought Prompts
☆12Updated 2 years ago
Alternatives and similar repositories for gpt3-arithmetic:
Users that are interested in gpt3-arithmetic are comparing it to the libraries listed below
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆77Updated 10 months ago
- Supporting code for ReCEval paper☆28Updated 5 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆46Updated last year
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆17Updated last year
- ☆39Updated 6 months ago
- Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…☆18Updated 2 months ago
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated last year
- Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…☆29Updated last year
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Updated last year
- ☆33Updated 8 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 3 weeks ago
- ☆22Updated 3 months ago
- [ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks☆52Updated last year
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆74Updated last year
- [EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-…☆20Updated 3 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆58Updated 4 months ago
- ☆18Updated 8 months ago
- A unified benchmark for math reasoning☆87Updated 2 years ago
- The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".☆63Updated last year
- ☆22Updated 5 months ago
- ☆45Updated last year
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆19Updated 2 years ago
- ☆93Updated last year
- ☆39Updated 2 years ago
- ☆34Updated 10 months ago
- ☆26Updated 7 months ago
- ☆27Updated 11 months ago
- ☆33Updated 10 months ago
- Evaluate the Quality of Critique☆35Updated 8 months ago