rkinas / rlhf_thinking_modelLinks

This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.

☆95

Alternatives and similar repositories for rlhf_thinking_model

Users that are interested in rlhf_thinking_model are comparing it to the libraries listed below

Sorting:

YuvrajSingh-mist / SmolLlama
So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…
☆15Updated 2 months ago
speakleash / speakleash
☆77Updated last year
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆231Updated 7 months ago
cfahlgren1 / observers
A Lightweight Library for AI Observability
☆243Updated 3 months ago
cognitivecomputations / kraken
☆66Updated last year
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 6 months ago
Vaibhavs10 / notebooks
☆127Updated 2 months ago
cognitivecomputations / spectrum
☆121Updated 2 months ago
deep-diver / llamaduo
This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.
☆305Updated 2 months ago
cohere-ai / cohere-finetune
A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models
☆73Updated 2 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆61Updated 4 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆64Updated 7 months ago
teknium1 / ShareGPT-Builder
☆114Updated 5 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆67Updated 2 months ago
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆76Updated last month
cognitivecomputations / grokadamw
☆130Updated 9 months ago
brendanhogan / picoDeepResearch
☆59Updated 2 weeks ago
PrithivirajDamodaran / Route0x
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆105Updated 2 months ago
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆78Updated 5 months ago
cognitivecomputations / agi_memory
☆134Updated 2 weeks ago
huggingface / Google-Cloud-Containers
Hugging Face Deep Learning Containers (DLCs) for Google Cloud
☆145Updated last month
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated 10 months ago
EveryInc / AI_Diplomacy
☆264Updated this week
huggingface / data-is-better-together
Let's build better datasets, together!
☆259Updated 5 months ago
muellerzr / minimal-trainer-zoo
Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines
☆197Updated last year
willkurt / token-explorer
A simple tool that let's you explore different possible paths that an LLM might sample.
☆171Updated last month
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆185Updated this week
osanseviero / hackerllama
My personal site
☆75Updated 10 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆37Updated last year
AniruddhaChattopadhyay / Books
☆162Updated 2 weeks ago