rkinas / reasoning_models_how_toLinks
This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest research, methodologies, and techniques for fine-tuning language models.
β105Updated 2 weeks ago
Alternatives and similar repositories for reasoning_models_how_to
Users that are interested in reasoning_models_how_to are comparing it to the libraries listed below
Sorting:
- π€ Benchmark Large Language Models Reliably On Your Dataβ381Updated this week
- GRadient-INformed MoEβ264Updated 10 months ago
- β155Updated 3 months ago
- All credits go to HuggingFace's Daily AI papers (https://huggingface.co/papers) and the research community. πAudio summaries here (httpsβ¦β191Updated this week
- β261Updated last month
- Banishing LLM Hallucinations Requires Rethinking Generalizationβ276Updated last year
- Collection of scripts and notebooks for OpenAI's latest GPT OSS modelsβ222Updated this week
- Inference, Fine Tuning and many more recipes with Gemma family of modelsβ265Updated 3 weeks ago
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb datasetβ¦β15Updated 4 months ago
- β86Updated 10 months ago
- β169Updated last week
- One click templates for inferencing Language Modelsβ203Updated this week
- β46Updated 4 months ago
- Complete implementation of Llama2 with/without KV cache & inference πβ48Updated last year
- Simple examples using Argilla tools to build AIβ53Updated 8 months ago
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMsβ313Updated 3 weeks ago
- Build datasets using natural languageβ507Updated 3 months ago
- From data to vector database effortlesslyβ79Updated 2 months ago
- β154Updated last month
- β65Updated 2 months ago
- Fine tune Gemma 3 on an object detection taskβ74Updated 3 weeks ago
- β134Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β173Updated 6 months ago
- Scrape and export data from the Open LLM Leaderboard.β45Updated 7 months ago
- β75Updated 10 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.β83Updated 3 months ago
- Solving data for LLMs - Create quality synthetic datasets!β150Updated 6 months ago
- β125Updated 3 weeks ago
- Simple UI for debugging correlations of text embeddingsβ288Updated 2 months ago
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok β¦β24Updated 3 weeks ago