paulosantosneto / unofficial-cot-decodingLinks
unofficial implementation of the CoT-decoding method for extract cot paths in an unsupervised way
☆23Updated 8 months ago
Alternatives and similar repositories for unofficial-cot-decoding
Users that are interested in unofficial-cot-decoding are comparing it to the libraries listed below
Sorting:
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆91Updated 3 months ago
- ☆36Updated 4 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆73Updated 7 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆104Updated 4 months ago
- ☆100Updated last week
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- ☆107Updated 2 weeks ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆54Updated 2 weeks ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆64Updated 3 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆114Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆89Updated last week
- ☆47Updated 3 months ago
- A curated paper list on LLM reasoning.☆88Updated last year
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆70Updated 2 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆54Updated 3 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆183Updated last year
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆35Updated 2 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- a curated list of the role of small models in the LLM era☆100Updated 8 months ago
- ☆105Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆155Updated this week
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆64Updated last year
- ☆83Updated 3 weeks ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆126Updated 9 months ago
- ☆41Updated 7 months ago
- ☆86Updated 7 months ago
- ☆45Updated 2 weeks ago
- This repo aims to record resource of role-playing abilities in LLMs, including dataset, paper, application, etc.☆124Updated 8 months ago