Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models
☆42Apr 22, 2025Updated 10 months ago
Alternatives and similar repositories for concise-reasoning
Users that are interested in concise-reasoning are comparing it to the libraries listed below
Sorting:
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆52Dec 7, 2025Updated 2 months ago
- ScrollNet for Continual Learning☆11Sep 11, 2023Updated 2 years ago
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year
- [NeurIPS 2025 Spotlight] Implementation of "KLASS: KL-Guided Fast Inference in Masked Diffusion Models"☆23Jan 3, 2026Updated 2 months ago
- ☆47Jan 31, 2026Updated last month
- ☆25Dec 13, 2024Updated last year
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆20Feb 23, 2026Updated last week
- ☆24Apr 3, 2025Updated 11 months ago
- ☆17Aug 1, 2025Updated 7 months ago
- ☆19Mar 3, 2025Updated last year
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Apr 2, 2024Updated last year
- Enlightener, the cutting-edge Retrieval-Augmented Generation (RAG) system that revolutionizes query responses. By combining the power of …☆14Jul 28, 2025Updated 7 months ago
- ☆57Updated this week
- ☆33Nov 18, 2025Updated 3 months ago
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated 10 months ago
- OpenPipe Reinforcement Learning Experiments☆32Mar 14, 2025Updated 11 months ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Jul 16, 2025Updated 7 months ago
- Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours☆160Feb 24, 2026Updated last week
- [NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning☆49Jan 20, 2026Updated last month
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆72Feb 25, 2025Updated last year
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆144Nov 13, 2025Updated 3 months ago
- AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVF …☆96Feb 15, 2026Updated 2 weeks ago
- ☆29Jan 23, 2024Updated 2 years ago
- ☆145Sep 12, 2025Updated 5 months ago
- Kate is Multimodal Live Assistant that ignites your browsing experience☆11Feb 15, 2025Updated last year
- Primus-SaFE(Stability and Fault Endurance)☆52Updated this week
- Repo for our course: Build a Drag-and-Drop Trello Board☆11Feb 12, 2024Updated 2 years ago
- unsloth-5090-multiple☆60May 21, 2025Updated 9 months ago
- Implementation of "RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation".☆247May 28, 2024Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- About Code release for "Imagination Mechanism: Mesh Information Propagation for Enhancing Data Efficiency in Reinforcement Learning"☆13Oct 7, 2023Updated 2 years ago
- ☆11Jan 11, 2022Updated 4 years ago
- code for polite☆11Feb 28, 2024Updated 2 years ago
- A collection of heat engines, based on the OpenAI Gym environment framework for use with reinforcement learning applications.☆15Dec 20, 2021Updated 4 years ago
- The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.☆13Jun 17, 2024Updated last year
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆29Feb 23, 2026Updated last week
- ☆14Mar 21, 2024Updated last year
- Compare 2 basketball players by reading/comparing NBA stats in an Excel sheet.☆11Aug 19, 2018Updated 7 years ago
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago