menhguin / minp_paper
Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper
☆33Updated last month
Alternatives and similar repositories for minp_paper:
Users that are interested in minp_paper are comparing it to the libraries listed below
- A repository for research on medium sized language models.☆76Updated 11 months ago
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆24Updated 2 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆45Updated 2 weeks ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- ☆16Updated 2 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated last month
- Knowledge Unlearning for Large Language Models☆25Updated this week
- ☆17Updated 4 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 6 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated last month
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 8 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- [Preprint] An inference-time decoding strategy with adaptive foresight sampling☆90Updated 2 weeks ago
- This is the official repository for Inheritune.☆111Updated 2 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆57Updated 3 weeks ago
- ☆38Updated 4 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- ☆24Updated 7 months ago
- ☆31Updated 3 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆85Updated last month
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- ☆60Updated last year
- ☆16Updated 3 weeks ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆59Updated last year