rkinas / rlhf_thinking_modelLinks
This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.
☆95Updated 2 months ago
Alternatives and similar repositories for rlhf_thinking_model
Users that are interested in rlhf_thinking_model are comparing it to the libraries listed below
Sorting:
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆15Updated 2 months ago
- ☆77Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆231Updated 7 months ago
- A Lightweight Library for AI Observability☆243Updated 3 months ago
- ☆66Updated last year
- Simple examples using Argilla tools to build AI☆53Updated 6 months ago
- ☆127Updated 2 months ago
- ☆121Updated 2 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆305Updated 2 months ago
- A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models☆73Updated 2 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆61Updated 4 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 7 months ago
- ☆114Updated 5 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆76Updated last month
- ☆130Updated 9 months ago
- ☆59Updated 2 weeks ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆105Updated 2 months ago
- An introduction to LLM Sampling☆78Updated 5 months ago
- ☆134Updated 2 weeks ago
- Hugging Face Deep Learning Containers (DLCs) for Google Cloud☆145Updated last month
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- ☆264Updated this week
- Let's build better datasets, together!☆259Updated 5 months ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines☆197Updated last year
- A simple tool that let's you explore different possible paths that an LLM might sample.☆171Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated this week
- My personal site☆75Updated 10 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- ☆162Updated 2 weeks ago