thu-wyz / inference_scaling
β61Updated 4 months ago
Alternatives and similar repositories for inference_scaling:
Users that are interested in inference_scaling are comparing it to the libraries listed below
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ119Updated 6 months ago
- β49Updated last month
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β100Updated 3 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ56Updated 3 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witβ¦β121Updated 8 months ago
- GenRM-CoT: Data release for verification rationalesβ53Updated 5 months ago
- Repo of paper "Free Process Rewards without Process Labels"β138Updated 2 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoningβ46Updated 4 months ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMsβ71Updated last year
- β65Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β107Updated last week
- TokenSkip: Controllable Chain-of-Thought Compression in LLMsβ103Updated 3 weeks ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".β52Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".β78Updated 3 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)β130Updated last month
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.β67Updated last week
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Styleβ28Updated last week
- Code for "Reasoning to Learn from Latent Thoughts"β77Updated this week
- β85Updated 3 weeks ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β171Updated 3 weeks ago
- β171Updated last month
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)β57Updated 5 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".β74Updated 2 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β130Updated 6 months ago
- β144Updated 3 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β47Updated last week
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)β51Updated 4 months ago
- β73Updated 2 weeks ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learningβ84Updated last month
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by gβ¦β32Updated 3 months ago