WORM-MAX / iGSM-Replication-physics-LLM
This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.
☆17Updated 8 months ago
Alternatives and similar repositories for iGSM-Replication-physics-LLM
Users that are interested in iGSM-Replication-physics-LLM are comparing it to the libraries listed below
Sorting:
- ☆30Updated 6 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated this week
- GenRM-CoT: Data release for verification rationales☆60Updated 7 months ago
- Test-time-training on nearest neighbors for large language models☆41Updated last year
- ☆66Updated 5 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆105Updated 5 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆93Updated 2 months ago
- ☆61Updated last month
- Code for "Reasoning to Learn from Latent Thoughts"☆94Updated last month
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆40Updated last month
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆119Updated last month
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆58Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆138Updated 3 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆54Updated 7 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆193Updated 9 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆146Updated 2 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated 3 weeks ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆46Updated 4 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆34Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆147Updated 2 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆72Updated 7 months ago
- Implementation of CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation☆20Updated 3 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆134Updated 7 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆18Updated this week
- Lightweight Adapting for Black-Box Large Language Models☆22Updated last year
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆71Updated 2 years ago
- ☆41Updated last year
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆54Updated 9 months ago