TergelMunkhbat / concise-reasoning
Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models
☆22Updated 2 weeks ago
Alternatives and similar repositories for concise-reasoning:
Users that are interested in concise-reasoning are comparing it to the libraries listed below
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆84Updated last month
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- ☆16Updated last month
- ☆23Updated 3 weeks ago
- ☆48Updated 5 months ago
- ☆24Updated 7 months ago
- Official Code Release for "Training a Generally Curious Agent"☆20Updated 2 weeks ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 weeks ago
- ☆35Updated last month
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated 3 weeks ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆28Updated last month
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models☆42Updated last week
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated 2 weeks ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 2 months ago
- ☆27Updated 3 weeks ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆52Updated last week
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆24Updated 2 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- ☆50Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆50Updated 4 months ago
- Complex Function Calling Benchmark.☆92Updated 2 months ago
- Train, tune, and infer Bamba model☆88Updated 3 months ago
- ☆115Updated last month
- Official repo of paper LM2☆36Updated 2 months ago
- ☆62Updated 2 weeks ago
- Exploring Model Kinship for Merging Large Language Models☆23Updated this week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆60Updated last month
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year