brendanhogan / picoDeepResearchLinks

☆68

Alternatives and similar repositories for picoDeepResearch

Users that are interested in picoDeepResearch are comparing it to the libraries listed below

Sorting:

s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆58Updated 3 weeks ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆109Updated 8 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated last year
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆98Updated 3 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆58Updated 6 months ago
collinear-ai / spider
craft post-training data recipes
☆53Updated this week
rosmineb / unit_test_rl
Project code for training LLMs to write better unit tests + code
☆21Updated 5 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆62Updated last year
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆84Updated 2 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆80Updated 7 months ago
alexzhang13 / rlm
Super basic implementation (gist-like) of RLMs with REPL environments.
☆242Updated 3 weeks ago
BBischof / yapping
Verbosity control for AI agents
☆64Updated last year
PrimeIntellect-ai / genesys
☆135Updated 7 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 9 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 9 months ago
goncalorafaria / qalign
QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.
☆24Updated 2 weeks ago
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆59Updated 6 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆56Updated 11 months ago
axolotl-ai-cloud / axolotl-cookbook
☆36Updated 3 months ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆210Updated 3 weeks ago
kurakurai / Luth
Luth is a state-of-the-art series of fine-tuned LLMs for French
☆38Updated 3 weeks ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 9 months ago
LaunchPlatform / marketplace
Marketplace ML experiment - training without backprop
☆27Updated 2 months ago
QuixiAI / kraken
☆67Updated last year
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 10 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆118Updated last year
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆164Updated this week
BhabhaAI / dataformer
Solving data for LLMs - Create quality synthetic datasets!
☆150Updated 9 months ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆103Updated 6 months ago