zhangir-azerbayev / proof-pileLinks

Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.

☆21

Alternatives and similar repositories for proof-pile

Users that are interested in proof-pile are comparing it to the libraries listed below

Sorting:

kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated 2 weeks ago
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆27Updated 6 months ago
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
iiis-ai / IterativeQuestionComposing
Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…
☆20Updated 7 months ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆44Updated last year
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated 2 years ago
DAMO-NLP-SG / CaRing
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
☆38Updated last year
whyNLP / Conic10K
Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.
☆27Updated last year
Lagooon / LeanSTaR
☆41Updated 10 months ago
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆18Updated 6 months ago
OSU-NLP-Group / reversal-curse-binding
☆23Updated 4 months ago
JHU-CLSP / RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆35Updated 10 months ago
GAIR-NLP / benbench
Benchmarking Benchmark Leakage in Large Language Models
☆55Updated last year
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆58Updated 2 years ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
RUCAIBox / BAMBOO
☆35Updated last year
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆86Updated last year
EleutherAI / semantic-memorization
☆44Updated 8 months ago
zhaoxlpku / SubgoalXL
☆25Updated 11 months ago
Strong-AI-Lab / Logical-and-abstract-reasoning
Evaluation on Logical Reasoning and Abstract Reasoning Challenges
☆28Updated 3 months ago
EleutherAI / pile_dedupe
Pile Deduplication Code
☆19Updated 2 years ago
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆25Updated 7 months ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 10 months ago
allenai / bff
☆39Updated last year
locuslab / scaling_laws_data_filtering
☆65Updated last year
GSYfate / knnlm-limits
Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"
☆23Updated 3 months ago
nyu-mll / ILF-for-code-generation
☆78Updated 4 months ago
HazyResearch / embroid
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Updated last year