HarleyCoops / OneShotGRPOLinks

One click away from a locally downloaded, fine-tuned model, hosted on hugging face, with inference built in. In two hours.

☆23

Alternatives and similar repositories for OneShotGRPO

Users that are interested in OneShotGRPO are comparing it to the libraries listed below

Sorting:

brendanhogan / picoDeepResearch
☆68Updated 5 months ago
AgnostiqHQ / multi-agent-llm
Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)
☆120Updated 8 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 9 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆109Updated 7 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆58Updated 2 weeks ago
ALucek / GRPO-Training
An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning
☆37Updated 5 months ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆103Updated 6 months ago
alexzhang13 / rlm
Super basic implementation (gist-like) of RLMs with REPL environments.
☆228Updated 2 weeks ago
davanstrien / data-for-fine-tuning-llms
☆80Updated last year
Arize-ai / prompt-learning
☆115Updated last week
ai8hyf / OpenResearchAssistant
An automated tool for discovering insights from research papaer corpora
☆138Updated last year
ali-bahrainian / RAG_best_practices
☆96Updated 7 months ago
weaviate / structured-rag
Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models
☆111Updated 6 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆56Updated 11 months ago
yueqis / API-Based-Agent
☆58Updated 4 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 8 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆38Updated last year
kurakurai / Luth
Luth is a state-of-the-art series of fine-tuned LLMs for French
☆37Updated 3 weeks ago
apple / ml-superposition-prompting
☆146Updated last year
AK391 / dailypapersHN
☆86Updated last year
janhq / verifiers-deepresearch
Verifiers for LLM Reinforcement Learning
☆77Updated last month
HazyResearch / eclair-agents
Automating enterprise workflows with multimodal agents
☆112Updated last year
writer / writing-in-the-margins
☆119Updated last year
AstraBert / ragcoon
Agentic RAG to help you build a startup🚀
☆55Updated 6 months ago
ALucek / LLM-distillation-guide
☆25Updated last year
dottxt-ai / demos
☆121Updated last month
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆59Updated 5 months ago
axolotl-ai-cloud / axolotl-cookbook
☆36Updated 3 months ago
jina-ai / correlations
Simple UI for debugging correlations of text embeddings
☆296Updated 5 months ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆50Updated last year