rkinas / reasoning_models_how_toLinks
This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.
☆98Updated this week
Alternatives and similar repositories for reasoning_models_how_to
Users that are interested in reasoning_models_how_to are comparing it to the libraries listed below
Sorting:
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆231Updated 7 months ago
- ☆74Updated 9 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆80Updated last month
- ☆46Updated 2 months ago
- List of resources, libraries and more for developers who would like to build with open-source machine learning off-the-shelf☆199Updated last year
- ☆127Updated 3 months ago
- Exploring Applications of GRPO☆230Updated last month
- Simple examples using Argilla tools to build AI☆53Updated 7 months ago
- ☆133Updated 10 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- ☆143Updated this week
- An introduction to LLM Sampling☆78Updated 6 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆31Updated 2 months ago
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆15Updated 3 months ago
- chrome & firefox extension to chat with webpages: local llms☆119Updated 6 months ago
- This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mast…☆354Updated 4 months ago
- Skrypty, tutoriale oraz programistyczna baza wiedzy dotycząca pracy z modelem Bielik.☆127Updated 3 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆68Updated 3 months ago
- Building GPT ...☆18Updated 6 months ago
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆65Updated 2 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆41Updated last month
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆47Updated last year
- Collection of resources for RL and Reasoning☆25Updated 4 months ago
- One click templates for inferencing Language Models☆190Updated last week
- A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models☆73Updated 3 months ago
- ☆155Updated 2 months ago
- coding CUDA everyday!☆34Updated 2 months ago
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- ☆43Updated this week
- ☆86Updated 9 months ago