[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (As Huggingface Daily Papers: https://huggingface.co/papers/2402.07625)
☆90Nov 23, 2025Updated 3 months ago
Alternatives and similar repositories for AutoMathText
Users that are interested in AutoMathText are comparing it to the libraries listed below
Sorting:
- ☆30Dec 27, 2024Updated last year
- ☆167May 2, 2024Updated last year
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆459Apr 18, 2024Updated last year
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- ☆71Oct 16, 2024Updated last year
- ☆18Apr 5, 2025Updated 10 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆273Apr 26, 2024Updated last year
- ☆26Jul 16, 2025Updated 7 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- ☆109Jul 15, 2025Updated 7 months ago
- Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lo…☆16Nov 27, 2024Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆201Dec 8, 2025Updated 2 months ago
- ☆64Apr 9, 2024Updated last year
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆33Jun 10, 2024Updated last year
- Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math…☆74Jul 27, 2024Updated last year
- Language models scale reliably with over-training and on downstream tasks☆100Apr 2, 2024Updated last year
- ☆565Nov 20, 2024Updated last year
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆266Jul 8, 2025Updated 7 months ago
- ☆35Jan 10, 2025Updated last year
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆316Dec 20, 2023Updated 2 years ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models☆454Feb 1, 2024Updated 2 years ago
- Large language models designed for formal theorem proving through tool-integrated reasoning.☆33Aug 13, 2025Updated 6 months ago
- ☆76Jan 8, 2026Updated last month
- ☆342Jun 5, 2025Updated 8 months ago
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆64Nov 27, 2024Updated last year
- ☆84Apr 18, 2024Updated last year
- This is the official implementation for MA-LoT.☆19Aug 4, 2025Updated 6 months ago
- [NeurlPS D&B 2024] Generative AI for Math: MathPile☆419Apr 4, 2025Updated 10 months ago
- ☆43Sep 19, 2024Updated last year
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆270Sep 12, 2024Updated last year
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆230Aug 28, 2024Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆588Dec 9, 2024Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated last year
- BERT score for text generation☆12Jan 15, 2025Updated last year
- ☆12Oct 5, 2022Updated 3 years ago
- Get aid from local LLMs right in your PowerShell☆15May 2, 2025Updated 10 months ago
- ☆12Apr 22, 2024Updated last year