HarleyCoops / smolThinker-.5B

A Qwen .5B reasoning model trained on OpenR1-Math-220k

☆12

Alternatives and similar repositories for smolThinker-.5B:

Users that are interested in smolThinker-.5B are comparing it to the libraries listed below

SidU / MathBlackBox
☆11Updated 8 months ago
arcee-ai / DAM
☆48Updated 4 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆83Updated last month
THU-KEG / Agentic-Reward-Modeling
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆76Updated 3 weeks ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated last month
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆91Updated 3 weeks ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆39Updated last month
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 3 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆81Updated 3 weeks ago
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 3 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆76Updated 10 months ago
leloykun / modded-nanogpt
NanoGPT (124M) quality in 2.67B tokens
☆28Updated last month
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆91Updated this week
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆55Updated 7 months ago
RobertCsordas / moeut
☆74Updated 7 months ago
gkamradt / SnakeBench
☆83Updated last month
SalesforceAIResearch / LaTRO
☆111Updated last month
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆56Updated 2 weeks ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆26Updated 3 weeks ago
serp-ai / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆31Updated 10 months ago
dinobby / MAgICoRE
☆24Updated 6 months ago
katiekang1998 / reasoning_generalization
☆31Updated 2 months ago
joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆39Updated 2 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆25Updated 2 weeks ago
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆31Updated 7 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated 5 months ago
EduardTalianu / EntropixLab
entropix style sampling + GUI
☆25Updated 5 months ago
portal-cornell / muCode
☆16Updated last month
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆28Updated last week
kyleliang919 / Super_Muon
☆47Updated last week